Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimal Query Processing Meets Information Theory

Similar presentations


Presentation on theme: "Optimal Query Processing Meets Information Theory"— Presentation transcript:

1 Optimal Query Processing Meets Information Theory
Dan Suciu – University of Washington Joint w/ Mahmoud Abo-Khamis and Hung Ngo RelationalAI Inc

2 Basic Question What is the optimal runtime to compute a query Q on a database D? Data complexity: Q is fixed, runtime = f(D) Applications: CSP where the nodes of Q are the variables of the CSP problem, its hyperedges are the constraints, and D represents the valid assignments Databases: …

3 Example select * -- natural join from R, S, T
Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X

4 Example select * -- natural join from R, S, T
Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X Query plan T(Z,X) R(X,Y) S(Y,Z)

5 Example select * -- natural join from R, S, T
Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X R: X Y 1 2 3 ... N/2 Query plan T(Z,X) N R(X,Y) S(Y,Z)

6 Example select * -- natural join from R, S, T
Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X R: X Y 1 2 3 ... N/2 S: (same as R) Y Z 1 2 3 ... N/2 T: (same as R) Z X 1 2 3 ... N/2 Query plan T(Z,X) N R(X,Y) S(Y,Z)

7 Example select * -- natural join from R, S, T
Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X R: X Y 1 2 3 ... N/2 S: (same as R) Y Z 1 2 3 ... N/2 T: (same as R) Z X 1 2 3 ... N/2 Query plan Θ(N2) tuples T(Z,X) N R(X,Y) S(Y,Z)

8 Example select * -- natural join from R, S, T
Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X 0 tuples R: X Y 1 2 3 ... N/2 S: (same as R) Y Z 1 2 3 ... N/2 T: (same as R) Z X 1 2 3 ... N/2 Query plan Θ(N2) tuples T(Z,X) N R(X,Y) S(Y,Z)

9 Example select * -- natural join from R, S, T
Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Will prove later: |Q(D)| ≤ N3/2 select * natural join from R, S, T where R.Y = S.Y and S.Z = T.Z and T.X = R.X 0 tuples R: X Y 1 2 3 ... N/2 S: (same as R) Y Z 1 2 3 ... N/2 T: (same as R) Z X 1 2 3 ... N/2 Query plan Θ(N2) tuples T(Z,X) N R(X,Y) S(Y,Z) Every plan costs Θ(N2)!

10 Outline Motivation Full CQ Boolean CQ Conclusions

11 Maximum Output Size maxD satisfies stats (|Q(D)|)
E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

12 Maximum Output Size maxD satisfies stats (|Q(D)|)
E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

13 Maximum Output Size maxD satisfies stats (|Q(D)|)
E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

14 Maximum Output Size maxD satisfies stats (|Q(D)|)
E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

15 Maximum Output Size maxD satisfies stats (|Q(D)|)
E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

16 Maximum Output Size maxD satisfies stats (|Q(D)|)
E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

17 Maximum Output Size maxD satisfies stats (|Q(D)|)
E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

18 Maximum Output Size maxD satisfies stats (|Q(D)|)
E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N No other info: |Q(D)| ≤ N2 S.Y is a key: |Q(D)| ≤ N S.Y has degree ≤ d: |Q(D)| ≤ d×N E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) No other info: |Q(D)| ≤ N3/2

19 Main Principle Find information-theoretic proof of the upper bound, or the submodular width Convert proof to algorithm

20 Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D  entropic function H

21 Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D  entropic function H Database D R(X,Y) S(Y,Z) T(Z,X) X Y a 3 2 b d Y Z 3 m 2 q Z X m a q b d

22 Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D  entropic function H Output Q(D) Database D R(X,Y) S(Y,Z) T(Z,X) X Y Z a 3 m 2 q b d X Y a 3 2 b d Y Z 3 m 2 q Z X m a q b d

23 Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D  entropic function H Output Q(D) Database D R(X,Y) S(Y,Z) T(Z,X) X Y Z a 3 m 1/5 2 q b d X Y a 3 2 b d Y Z 3 m 2 q Z X m a q b d H(XYZ) = log |Q(D)|

24 Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D  entropic function H Output Q(D) Database D R(X,Y) S(Y,Z) T(Z,X) X Y Z a 3 m 1/5 2 q b d X Y a 3 2/5 2 1/5 b d Y Z 3 m 2/5 2 q 1/5 Z X m a 1/5 q 2/5 b d H(XYZ) = log |Q(D)|

25 Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) Database D  entropic function H Output Q(D) Database D R(X,Y) S(Y,Z) T(Z,X) X Y Z a 3 m 1/5 2 q b d X Y a 3 2/5 2 1/5 b d Y Z 3 m 2/5 2 q 1/5 Z X m a 1/5 q 2/5 b d H(XYZ) = log |Q(D)| H(XY) ≤ log NR H(YZ) ≤ log NS H(Z|Y) ≤ log degS(z|y) H(XZ) ≤ log NT Cardinalitites, functional dependences, max degrees

26 Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2

27 Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output|

28 Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output|

29 Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output|

30 Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output| submodularity

31 Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Output| submodularity

32 Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Q(D)| submodularity

33 Shearer’s inequality h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ)
Proof of Upper Bound Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) |R|,|S|,|T| ≤ N  |Q(D)|≤ N3/2 submodularity 3 log N ≥ h(XY) + h(YZ) + h(XZ) ≥ h(XYZ) + h(Y) + h(XZ) ≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ) = 2 log |Q(D)| submodularity Shearer’s inequality h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ)

34 Proof to Algorithm + Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X)
h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) h(XY)+h(YZ)+h(XZ) h(XYZ) h(Y) + h(XZ) + Proof h(XYZ)

35 Proof to Algorithm + Runtime Õ(N3/2)
Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) Algorithm h(XY)+h(YZ)+h(XZ) R(X,Y)∧S(Y,Z)∧T(Z,X) h(XYZ) h(Y) + h(XZ) + Proof h(XYZ)

36 Proof to Algorithm + Runtime Õ(N3/2)
Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) Algorithm h(XY)+h(YZ)+h(XZ) R(X,Y)∧S(Y,Z)∧T(Z,X) N3/2 h(XYZ) h(Y) + h(XZ) + Rlight(X,Y)∧S(Y,Z) Proof h(XYZ) Rlight or Rheavy: degree(Y) ≤ or > N1/2

37 Proof to Algorithm + ∪ Runtime Õ(N3/2)
Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) Algorithm h(XY)+h(YZ)+h(XZ) R(X,Y)∧S(Y,Z)∧T(Z,X) N3/2 h(XYZ) h(Y) + h(XZ) + N3/2 Rlight(X,Y)∧S(Y,Z) Proof h(XYZ) Rheavy(Y)∧T(X,Z) Rlight or Rheavy: degree(Y) ≤ or > N1/2

38 Proof to Algorithm + ∪ Runtime Õ(N3/2)
Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X) h(XY) + h(YZ) + h(XZ) ≥ 2 h(XYZ) Algorithm h(XY)+h(YZ)+h(XZ) R(X,Y)∧S(Y,Z)∧T(Z,X) N3/2 h(XYZ) h(Y) + h(XZ) + N3/2 Rlight(X,Y)∧S(Y,Z) Proof h(XYZ) Rheavy(Y)∧T(X,Z) Rlight or Rheavy: degree(Y) ≤ or > N1/2

39 Another Quick Example Runtime Õ(N3/2)
Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,U) ∧ A(X,Z,U) ∧ B(X,Y,U) 3 log N ≥ h(XY)+h(YZ)+h(XZ)+h(U|XZ)+h(X|YU) ≥ h(XY)+h(YZ)+h(XZ)+h(U|XYZ)+h(X|YZU) h(Y)+h(XZ)+h(X|YZU) ≥ h(XYZ)+h(U|XYZ) h(XYZU) h(XYZ)+h(X|YZU) h(XYZU)

40 Enumeration Problem: Discussion
Cardinalities: [Atserias,Grohe,Marx’08, Ngo,Re,Rudra’13] Entropic bound = polymatroid bound Algorithm for Q(D) has single log factor Cardinalities + FDs + max degrees: Entropic bound ≨ polymatroid bound Algorithm for Q(D) has polylog factor

41 Outline Motivation Full CQ Boolean CQ Conclusions

42 Background: Tree Decomposition
Informally: TD = a tree where each node t represents an enumeration problem Fractional hypetree width [Grohe,Marx’14] mintree maxnode t maxD Submodular width [Marx’2013] maxD mintree maxnode t

43 Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x)
|R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97]

44 mintree maxnode t maxD Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x)
|R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] mintree maxnode t maxD

45 mintree maxnode t maxD Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x)
|R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] mintree maxnode t maxD R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y)

46 mintree maxnode t maxD Runtime Õ(N2) (suboptimal)
Z X Y Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] mintree maxnode t maxD R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) Runtime Õ(N2) (suboptimal)

47 mintree maxnode t maxD Runtime Õ(N2) (suboptimal)
Z X Y Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] mintree maxnode t maxD R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) Runtime Õ(N2) (suboptimal)

48 maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x)
|R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y)

49 maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x)
|R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) =

50 maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x)
|R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = T2

51 maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x)
|R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = T2

52 maxD mintree maxnode t Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x)
|R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))

53 3 log N ≥ h(xy) + h(yz) + h(zu)
Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))

54 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu)
Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))

55 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu)
Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) ≥ h(xyz) + h(yzu) + h(∅) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))

56 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu)
Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) ≥ h(xyz) + h(yzu) + h(∅) ≥ 2 min(h(xyz),h(yzu)) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy)))

57 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu)
Q() = ∃x∃y∃z∃u R(x,y)∧S(y,z)∧T(z,u)∧K(u,x) |R|,|S|,|T|,|K| ≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97] maxD mintree maxnode t R(x,y),S(y,z) T(z,u),K(u,x) Tree decompositions subw(Q) = 3/2 log N S(y,z),T(z,u) K(u,x),R(x,y) T1 min( max(h(xyz),h(zux)), max(h(yzu),h(uxy))) = 3 log N ≥ h(xy) + h(yz) + h(zu) ≥ h(xyz) + h(y) + h(zu) ≥ h(xyz) + h(yzu) + h(∅) ≥ 2 min(h(xyz),h(yzu)) T2 = max(min(h(xyz),h(yzu)), min(h(xyz),h(uxy)), min(h(zux),h(yzu)), min(h(zux),h(uxy))) ≤ 3/2 log N

58 Proof to Algorithm Use the proof of: to compute the disjunctive datalog rule: (details omitted) h(xyz)+h(yzu) ≤ h(xy) + h(yz) + h(zu) A(x,y,z) ∨ B(y,z,u)  R(x,y) ∧ S(y,z) ∧ T(z,u) Runtime Õ(N3/2)

59 Outline Motivation Full CQ Boolean CQ Conclusions

60 Conclusions Query evaluation summary: Information theory  Proof
Proof  Algorithm Open problems: Better “Proof  Algorithm” Fine-grained lower bounds

61 Thank You! Questions?


Download ppt "Optimal Query Processing Meets Information Theory"

Similar presentations


Ads by Google