Optimal Query Processing Meets Information Theory

Slides:



Advertisements
Similar presentations
Polynomial Interpolation over Composites. Parikshit Gopalan Georgia Tech.
Advertisements

University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
1 LP Duality Lecture 13: Feb Min-Max Theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum.
Estimating the distribution of the incubation period of HIV/AIDS Marloes H. Maathuis Joint work with: Piet Groeneboom and Jon A. Wellner.
CSE544 Database Statistics Tuesday, February 15 th, 2011 Dan Suciu , Winter
Lecture 3: Source Coding Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
Constraint Optimization Presentation by Nathan Stender Chapter 13 of Constraint Processing by Rina Dechter 3/25/20131Constraint Optimization.
Top-K Query Evaluation on Probabilistic Data Christopher Ré, Nilesh Dalvi and Dan Suciu University of Washington.
Precedence Constrained Scheduling Abhiram Ranade Dept. of CSE IIT Bombay.
Efficient Query Evaluation on Probabilistic Databases
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
S KEW IN P ARALLEL Q UERY P ROCESSING Paraschos Koutris Paul Beame Dan Suciu University of Washington PODS 2014.
Computability and Complexity 8-1 Computability and Complexity Andrei Bulatov Logic Reminder.
Communication Cost in Parallel Query Processing
16:36MCS - WG20041 On the Maximum Cardinality Search Lower Bound for Treewidth Hans Bodlaender Utrecht University Arie Koster ZIB Berlin.
1 9. Evaluation of Queries Query evaluation – Quantifier Elimination and Satisfiability Example: Logical Level: r   y 1,…y n  r’ Constraint.
Anagh Lal Monday, April 14, Chapter 9 – Tree Decomposition Methods Anagh Lal CSCE Advanced Constraint Processing.
1 8. Safe Query Languages Safe program – its semantics can be at least partially computed on any valid database input. Safety is tied to program verification,
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
Jean-Charles REGIN Michel RUEHER ILOG Sophia Antipolis Université de Nice – Sophia Antipolis A global constraint combining.
Integer Programming Difference from linear programming –Variables x i must take on integral values, not real values Lots of interesting problems can be.
Defining Polynomials p 1 (n) is the bound on the length of an input pair p 2 (n) is the bound on the running time of f p 3 (n) is a bound on the number.
C OMMUNICATION S TEPS F OR P ARALLEL Q UERY P ROCESSING Paraschos Koutris Paul Beame Dan Suciu University of Washington PODS 2013.
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
General Database Statistics Using Maximum Entropy Raghav Kaushik 1, Christopher Ré 2, and Dan Suciu 3 1 Microsoft Research 2 University of Wisconsin--Madison.
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
The Complexity of Optimization Problems. Summary -Complexity of algorithms and problems -Complexity classes: P and NP -Reducibility -Karp reducibility.
A NSWERING C ONJUNCTIVE Q UERIES W ITH I NEQUALITIES Paris Koutris 1 Tova Milo 2 Sudeepa Roy 1 Dan Suciu 1 ICDT University of Washington 2 Tel Aviv.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
PhD Projects Rahul Santhanam University of Edinburgh.
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
Speeding Up Enumeration Algorithms with Amortized Analysis Takeaki Uno (National Institute of Informatics, JAPAN)
From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
Hedonic Clustering Games Moran Feldman Joint work with: Seffi Naor and Liane Lewin-Eytan.
Logic Programming. Formal Logics- Recap Formulas – w/out quantifiers Free Variables Bound Variables Assignments and satisfaction Validity and satisfiability.
Foundations of Constraint Processing, Spring 2009 Structure-Based Methods: An Introduction 1 Foundations of Constraint Processing CSCE421/821, Spring 2009.
A Semantic Caching Method Based on Linear Constraints Yoshiharu Ishikawa and Hiroyuki Kitagawa University of Tsukuba
Parallel Evaluation of Conjunctive Queries Paraschos Koutris and Dan Suciu University of Washington PODS 2011, Athens.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Clustering Data Streams A presentation by George Toderici.
Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
A Course on Probabilistic Databases
Introduction to Information theory
Managing Data at Scale Ke Yi and Dan Suciu Dagstuhl 2016.
Hard Problems Introduction to NP
Optimal Query Processing Meets Information Theory
Queries with Difference on Probabilistic Databases
COT 5611 Operating Systems Design Principles Spring 2012
Data Integration with Dependent Sources
COT 5611 Operating Systems Design Principles Spring 2014
The Relational Algebra and Relational Calculus
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Hung Q. Ngo RelationalAI Inc.
Testing Assignments of Boolean CSP’s
Great Ideas: Algorithm Implementation
Radu Curticapean, Holger Dell, Dániel Marx
Reachability on Suffix Tree Graphs
CSC 380: Design and Analysis of Algorithms
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Optimal Query Processing Meets Information Theory Dan Suciu – University of Washington Joint w/ Mahmoud Abo-Khamis and Hung Ngo RelationalAI Inc

Basic Question What is the optimal runtime to compute a query Q on a database D? Data complexity: Q is fixed, runtime = f(D) Applications: CSP where the nodes of Q are the variables of the CSP problem, its hyperedges are the constraints, and D represents the valid assignments Databases: …

Example select * -- natural join from R, S, T Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * -- natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X

Example select * -- natural join from R, S, T Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * -- natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X Query plan ⨝ ⨝ T(Z,X) R(X,Y) S(Y,Z)

Example select * -- natural join from R, S, T Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * -- natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X R: X Y 1 2 3 ... N/2 Query plan ⨝ ⨝ T(Z,X) N R(X,Y) S(Y,Z)

Example select * -- natural join from R, S, T Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * -- natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X R: X Y 1 2 3 ... N/2 S: (same as R) Y Z 1 2 3 ... N/2 T: (same as R) Z X 1 2 3 ... N/2 Query plan ⨝ ⨝ T(Z,X) N R(X,Y) S(Y,Z)

Example select * -- natural join from R, S, T Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * -- natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X R: X Y 1 2 3 ... N/2 S: (same as R) Y Z 1 2 3 ... N/2 T: (same as R) Z X 1 2 3 ... N/2 Query plan ⨝ Θ(N2) tuples ⨝ T(Z,X) N R(X,Y) S(Y,Z)

Example select * -- natural join from R, S, T Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * -- natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X 0 tuples R: X Y 1 2 3 ... N/2 S: (same as R) Y Z 1 2 3 ... N/2 T: (same as R) Z X 1 2 3 ... N/2 Query plan ⨝ Θ(N2) tuples ⨝ T(Z,X) N R(X,Y) S(Y,Z)

Example select * -- natural join from R, S, T Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * -- natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X 0 tuples R: X Y 1 2 3 ... N/2 S: (same as R) Y Z 1 2 3 ... N/2 T: (same as R) Z X 1 2 3 ... N/2 Query plan ⨝ Θ(N2) tuples ⨝ T(Z,X) N R(X,Y) S(Y,Z) Every plan costs Θ(N2)!

Outline Motivation Full CQ Boolean CQ Conclusions

Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

Main Principle Find information-theoretic proof of the upper bound, or the submodular width Convert proof to algorithm

Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D  entropic function H

Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D  entropic function H Database D R(X,Y) S(Y,Z) T(Z,X) X Y a 3 2 b d Y Z 3 m 2 q Z X m a q b d

Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D  entropic function H Output Q(D) Database D R(X,Y) S(Y,Z) T(Z,X) X Y Z a 3 m 2 q b d X Y a 3 2 b d Y Z 3 m 2 q Z X m a q b d

Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D  entropic function H Output Q(D) Database D R(X,Y) S(Y,Z) T(Z,X) X Y Z a 3 m 1/5 2 q b d X Y a 3 2 b d Y Z 3 m 2 q Z X m a q b d H(XYZ) = log |Q(D)|

Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D  entropic function H Output Q(D) Database D R(X,Y) S(Y,Z) T(Z,X) X Y Z a 3 m 1/5 2 q b d X Y a 3 2/5 2 1/5 b d Y Z 3 m 2/5 2 q 1/5 Z X m a 1/5 q 2/5 b d H(XYZ) = log |Q(D)|

Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D  entropic function H Output Q(D) Database D R(X,Y) S(Y,Z) T(Z,X) X Y Z a 3 m 1/5 2 q b d X Y a 3 2/5 2 1/5 b d Y Z 3 m 2/5 2 q 1/5 Z X m a 1/5 q 2/5 b d H(XYZ) = log |Q(D)| H(XY) ≤ log NR H(YZ) ≤ log NS H(Z|Y) ≤ log degS(z|y) H(XZ) ≤ log NT Cardinalitites, functional dependences, max degrees

Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2

Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output|

Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output|

Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output|

Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output| submodularity

Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output| submodularity

Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Q(D)| submodularity

Shearer’s inequality h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Q(D)| submodularity Shearer’s inequality h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ)

Proof to Algorithm + Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) h(XY)+h(YZ)+h(XZ) h(XYZ) h(Y) + h(XZ) + Proof h(XYZ)

Proof to Algorithm + Runtime Õ(N3/2) Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) Algorithm h(XY)+h(YZ)+h(XZ) R(X,Y)∧S(Y,Z)∧T(Z,X) h(XYZ) h(Y) + h(XZ) + Proof h(XYZ)

Proof to Algorithm + Runtime Õ(N3/2) Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) Algorithm h(XY)+h(YZ)+h(XZ) R(X,Y)∧S(Y,Z)∧T(Z,X) N3/2 h(XYZ) h(Y) + h(XZ) + Rlight(X,Y)∧S(Y,Z) Proof h(XYZ) Rlight or Rheavy: degree(Y) ≤ or > N1/2

Proof to Algorithm + ∪ Runtime Õ(N3/2) Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) Algorithm h(XY)+h(YZ)+h(XZ) R(X,Y)∧S(Y,Z)∧T(Z,X) N3/2 h(XYZ) h(Y) + h(XZ) + N3/2 Rlight(X,Y)∧S(Y,Z) ∪ Proof h(XYZ) Rheavy(Y)∧T(X,Z) Rlight or Rheavy: degree(Y) ≤ or > N1/2

Proof to Algorithm + ∪ Runtime Õ(N3/2) Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) Algorithm h(XY)+h(YZ)+h(XZ) R(X,Y)∧S(Y,Z)∧T(Z,X) N3/2 h(XYZ) h(Y) + h(XZ) + N3/2 Rlight(X,Y)∧S(Y,Z) ∪ Proof h(XYZ) Rheavy(Y)∧T(X,Z) Rlight or Rheavy: degree(Y) ≤ or > N1/2

Another Quick Example Runtime Õ(N3/2) Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,U) ∧ A(X,Z,U) ∧ B(X,Y,U) 3 log N ≥ h(XY)+h(YZ)+h(XZ)+h(U|XZ)+h(X|YU) ≥ h(XY)+h(YZ)+h(XZ)+h(U|XYZ)+h(X|YZU) h(Y)+h(XZ)+h(X|YZU) ≥ h(XYZ)+h(U|XYZ) h(XYZU) h(XYZ)+h(X|YZU) h(XYZU)

Enumeration Problem: Discussion Cardinalities: [Atserias,Grohe,Marx’08, Ngo,Re,Rudra’13] Entropic bound = polymatroid bound Algorithm for Q(D) has single log factor Cardinalities + FDs + max degrees: Entropic bound ≨ polymatroid bound Algorithm for Q(D) has polylog factor

Outline Motivation Full CQ Boolean CQ Conclusions

Background: Tree Decomposition Informally: TD = a tree where each node t represents an enumeration problem Fractional hypetree width [Grohe,Marx’14] mintree maxnode t maxD Submodular width [Marx’2013] maxD mintree maxnode t

Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97]

mintree maxnode t maxD Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] mintree maxnode t maxD

mintree maxnode t maxD Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] mintree maxnode t maxD R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y)

mintree maxnode t maxD Runtime Õ(N2) (suboptimal) Z X Y Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] mintree maxnode t maxD R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) Runtime Õ(N2) (suboptimal)

mintree maxnode t maxD Runtime Õ(N2) (suboptimal) Z X Y Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] mintree maxnode t maxD R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) Runtime Õ(N2) (suboptimal)

maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y)

maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) =

maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = T2

maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = T2

maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))

3 log N ≥ h(xy) + h(yz) + h(zu) Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))

3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))

3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) ≥ h(xyz) + h(yzu) + h(∅) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))

3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) ≥ h(xyz) + h(yzu) + h(∅) ≥ 2 min(h(xyz),h(yzu)) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))

3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions subw(Q) = 3/2 log N S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) ≥ h(xyz) + h(yzu) + h(∅) ≥ 2 min(h(xyz),h(yzu)) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy))) ≤ 3/2 log N

Proof to Algorithm Use the proof of: to compute the disjunctive datalog rule: (details omitted) h(xyz)+h(yzu) ≤ h(xy) + h(yz) + h(zu) A(x,y,z) ∨ B(y,z,u)  R(x,y) ∧ S(y,z) ∧ T(z,u) Runtime Õ(N3/2)

Outline Motivation Full CQ Boolean CQ Conclusions

Conclusions Query evaluation summary: Information theory  Proof Proof  Algorithm Open problems: Better “Proof  Algorithm” Fine-grained lower bounds

Thank You! Questions?