Optimal Query Processing Meets Information Theory Dan Suciu – University of Washington Joint w/ Mahmoud Abo-Khamis and Hung Ngo RelationalAI Inc
Basic Question What is the optimal runtime to compute a query Q on a database D? Data complexity: Q is fixed, runtime = f(D) Applications: CSP where the nodes of Q are the variables of the CSP problem, its hyperedges are the constraints, and D represents the valid assignments Databases: …
Example select * -- natural join from R, S, T Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * -- natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X
Example select * -- natural join from R, S, T Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * -- natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X Query plan ⨝ ⨝ T(Z,X) R(X,Y) S(Y,Z)
Example select * -- natural join from R, S, T Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * -- natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X R: X Y 1 2 3 ... N/2 Query plan ⨝ ⨝ T(Z,X) N R(X,Y) S(Y,Z)
Example select * -- natural join from R, S, T Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * -- natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X R: X Y 1 2 3 ... N/2 S: (same as R) Y Z 1 2 3 ... N/2 T: (same as R) Z X 1 2 3 ... N/2 Query plan ⨝ ⨝ T(Z,X) N R(X,Y) S(Y,Z)
Example select * -- natural join from R, S, T Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * -- natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X R: X Y 1 2 3 ... N/2 S: (same as R) Y Z 1 2 3 ... N/2 T: (same as R) Z X 1 2 3 ... N/2 Query plan ⨝ Θ(N2) tuples ⨝ T(Z,X) N R(X,Y) S(Y,Z)
Example select * -- natural join from R, S, T Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * -- natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X 0 tuples R: X Y 1 2 3 ... N/2 S: (same as R) Y Z 1 2 3 ... N/2 T: (same as R) Z X 1 2 3 ... N/2 Query plan ⨝ Θ(N2) tuples ⨝ T(Z,X) N R(X,Y) S(Y,Z)
Example select * -- natural join from R, S, T Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * -- natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X 0 tuples R: X Y 1 2 3 ... N/2 S: (same as R) Y Z 1 2 3 ... N/2 T: (same as R) Z X 1 2 3 ... N/2 Query plan ⨝ Θ(N2) tuples ⨝ T(Z,X) N R(X,Y) S(Y,Z) Every plan costs Θ(N2)!
Outline Motivation Full CQ Boolean CQ Conclusions
Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2
Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2
Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2
Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2
Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2
Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2
Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2
Maximum Output Size maxD satisfies stats (|Q(D)|) E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2
Main Principle Find information-theoretic proof of the upper bound, or the submodular width Convert proof to algorithm
Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D entropic function H
Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D entropic function H Database D R(X,Y) S(Y,Z) T(Z,X) X Y a 3 2 b d Y Z 3 m 2 q Z X m a q b d
Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D entropic function H Output Q(D) Database D R(X,Y) S(Y,Z) T(Z,X) X Y Z a 3 m 2 q b d X Y a 3 2 b d Y Z 3 m 2 q Z X m a q b d
Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D entropic function H Output Q(D) Database D R(X,Y) S(Y,Z) T(Z,X) X Y Z a 3 m 1/5 2 q b d X Y a 3 2 b d Y Z 3 m 2 q Z X m a q b d H(XYZ) = log |Q(D)|
Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D entropic function H Output Q(D) Database D R(X,Y) S(Y,Z) T(Z,X) X Y Z a 3 m 1/5 2 q b d X Y a 3 2/5 2 1/5 b d Y Z 3 m 2/5 2 q 1/5 Z X m a 1/5 q 2/5 b d H(XYZ) = log |Q(D)|
Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D entropic function H Output Q(D) Database D R(X,Y) S(Y,Z) T(Z,X) X Y Z a 3 m 1/5 2 q b d X Y a 3 2/5 2 1/5 b d Y Z 3 m 2/5 2 q 1/5 Z X m a 1/5 q 2/5 b d H(XYZ) = log |Q(D)| H(XY) ≤ log NR H(YZ) ≤ log NS H(Z|Y) ≤ log degS(z|y) H(XZ) ≤ log NT Cardinalitites, functional dependences, max degrees
Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N |Q(D)|≤ N3/2
Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N |Q(D)|≤ N3/2 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output|
Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output|
Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output|
Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output| submodularity
Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output| submodularity
Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Q(D)| submodularity
Shearer’s inequality h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Q(D)| submodularity Shearer’s inequality h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ)
Proof to Algorithm + Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) h(XY)+h(YZ)+h(XZ) h(XYZ) h(Y) + h(XZ) + Proof h(XYZ)
Proof to Algorithm + Runtime Õ(N3/2) Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) Algorithm h(XY)+h(YZ)+h(XZ) R(X,Y)∧S(Y,Z)∧T(Z,X) h(XYZ) h(Y) + h(XZ) + Proof h(XYZ)
Proof to Algorithm + Runtime Õ(N3/2) Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) Algorithm h(XY)+h(YZ)+h(XZ) R(X,Y)∧S(Y,Z)∧T(Z,X) N3/2 h(XYZ) h(Y) + h(XZ) + Rlight(X,Y)∧S(Y,Z) Proof h(XYZ) Rlight or Rheavy: degree(Y) ≤ or > N1/2
Proof to Algorithm + ∪ Runtime Õ(N3/2) Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) Algorithm h(XY)+h(YZ)+h(XZ) R(X,Y)∧S(Y,Z)∧T(Z,X) N3/2 h(XYZ) h(Y) + h(XZ) + N3/2 Rlight(X,Y)∧S(Y,Z) ∪ Proof h(XYZ) Rheavy(Y)∧T(X,Z) Rlight or Rheavy: degree(Y) ≤ or > N1/2
Proof to Algorithm + ∪ Runtime Õ(N3/2) Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) Algorithm h(XY)+h(YZ)+h(XZ) R(X,Y)∧S(Y,Z)∧T(Z,X) N3/2 h(XYZ) h(Y) + h(XZ) + N3/2 Rlight(X,Y)∧S(Y,Z) ∪ Proof h(XYZ) Rheavy(Y)∧T(X,Z) Rlight or Rheavy: degree(Y) ≤ or > N1/2
Another Quick Example Runtime Õ(N3/2) Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,U) ∧ A(X,Z,U) ∧ B(X,Y,U) 3 log N ≥ h(XY)+h(YZ)+h(XZ)+h(U|XZ)+h(X|YU) ≥ h(XY)+h(YZ)+h(XZ)+h(U|XYZ)+h(X|YZU) h(Y)+h(XZ)+h(X|YZU) ≥ h(XYZ)+h(U|XYZ) h(XYZU) h(XYZ)+h(X|YZU) h(XYZU)
Enumeration Problem: Discussion Cardinalities: [Atserias,Grohe,Marx’08, Ngo,Re,Rudra’13] Entropic bound = polymatroid bound Algorithm for Q(D) has single log factor Cardinalities + FDs + max degrees: Entropic bound ≨ polymatroid bound Algorithm for Q(D) has polylog factor
Outline Motivation Full CQ Boolean CQ Conclusions
Background: Tree Decomposition Informally: TD = a tree where each node t represents an enumeration problem Fractional hypetree width [Grohe,Marx’14] mintree maxnode t maxD Submodular width [Marx’2013] maxD mintree maxnode t
Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97]
mintree maxnode t maxD Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] mintree maxnode t maxD
mintree maxnode t maxD Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] mintree maxnode t maxD R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y)
mintree maxnode t maxD Runtime Õ(N2) (suboptimal) Z X Y Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] mintree maxnode t maxD R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) Runtime Õ(N2) (suboptimal)
mintree maxnode t maxD Runtime Õ(N2) (suboptimal) Z X Y Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] mintree maxnode t maxD R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) Runtime Õ(N2) (suboptimal)
maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y)
maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) =
maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = T2
maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = T2
maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))
3 log N ≥ h(xy) + h(yz) + h(zu) Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))
3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))
3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) ≥ h(xyz) + h(yzu) + h(∅) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))
3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) ≥ h(xyz) + h(yzu) + h(∅) ≥ 2 min(h(xyz),h(yzu)) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))
3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions subw(Q) = 3/2 log N S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) ≥ h(xyz) + h(yzu) + h(∅) ≥ 2 min(h(xyz),h(yzu)) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy))) ≤ 3/2 log N
Proof to Algorithm Use the proof of: to compute the disjunctive datalog rule: (details omitted) h(xyz)+h(yzu) ≤ h(xy) + h(yz) + h(zu) A(x,y,z) ∨ B(y,z,u) R(x,y) ∧ S(y,z) ∧ T(z,u) Runtime Õ(N3/2)
Outline Motivation Full CQ Boolean CQ Conclusions
Conclusions Query evaluation summary: Information theory Proof Proof Algorithm Open problems: Better “Proof Algorithm” Fine-grained lower bounds
Thank You! Questions?