Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of.

Similar presentations


Presentation on theme: "1 Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of."— Presentation transcript:

1 1 Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of Jerusalem

2 2 A Formal Definitions of Full Disjunction

3 3 Preliminary Notations Given –a set of relations r 1, …, r n –with schemes R 1, …, R n, respectively We denote with t ij the j-th tuple of r i For X  R i, we denote by t ij [X] the projection of t ij on X Next, we give some preliminary definitions

4 4 Scheme Graph Two distinct schemes R i and R j are connected if R i  R j is non-empty The scheme graph of R 1, …, R n consists of –A node for each scheme R i –An edge between R i and R j if R i and R j are connected Movies Actors Actors-that-Directed Acted-in

5 5 Connected Relations Schemes Relation schemes R i 1, …, R i m are connected if their scheme graph is connected Tuples t i 1 j 1, …, t i m j m, from m distinct relations, are connected if the relation schemes of these relations are connected MoviesActors Acted-in Connected Relation Schemes MoviesActors Unconnected Relation Schemes

6 6 Join Consistent Tuples Two tuples t i 1 j 1 and t i 2 j 2 are join consistent if t i 1 j 1 [R i 1  R i 2 ] = t i 2 j 2 [R i 1  R i 2 ] m tuples, from m distinct relations, are join consistent if every pair of connected tuples are join consistent

7 7 Universal Tuple A universal tuple u is defined over all the attributes in R 1  …  R n and consists of null and non-null values We denote by û the non-null portion of u A universal tuple is called integrated tuple if there are m connected and join consistent tuples t i 1 j 1, …, t i m j m such that û is the natural join of t i 1 j 1, …, t i m j m

8 8 Maximal Universal Tuple A universal tuple u subsumes a universal tuple v if u is equal to v on all the non-null attributes of v (i.e., u can be created from v by replacing some null values with non-null values) In a given set D, a tuple u is maximal if there is no tuple in D, other than u, that subsumes u

9 9 A Full Disjunction The full disjunction of r 1, …, r n is the set of all maximal integrated tuples that can be generated from m tuples of r 1, …, r n

10 10 Acyclic Scheme Given a set of schemes R 1, …, R n, their scheme hypergraph consists of –A node for each attribute that appears in some R i –For each R i (1  i  n), a hyperedge that includes the attributes of R i α-acyclic scheme hypergraph: –All the hyperedges can be removed by a sequence of ear removals γ-acyclic scheme hypergraph: –The Bachman diagram of the scheme hypergraph is acyclic

11 11

12 12 Computing Full Disjunctions

13 13 Product Graph Given a query Q and a database D, the product of Q and D is a graph such that –The nodes are pairs of a node of Q and a node of D –The edges are between nodes such that the pair of nodes of Q and the pair of nodes of D both are connected by edges with the same label in Q and in D, respectively –The root is the pair of the root of Q and the root of D

14 14 1 24 5 6 title language 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 English 1/12/1935 Woody Allen title year filmography item v1v1 v2v2 w1w1 v3v3 title actor movie director filmography item w2w2 w3w3 w4w4 date of birth name language The product of the query and the database is the next graph

15 15 title language director name movie date of birth movie actor title filmography item V 1, 1 V 2, 2 V 2, 3V 3, 4 w 1, 5w 2, 6w 1, 8w 3, 10w 4, 11 There are additional nodes that are not reachable from the root

16 16 For a subgraph G of the product graph 1.G has no repeated variables 2.G contains the root 3.Each node in G is reachable from the root 4.G preserves the constraints (edges) of the query Conditions 1 – 3  OR-matching graph Conditions 1 – 4  weak-matching graph Matching as a Subgraph of the Product Graph

17 17 title language director name movie date of birth movie actor title filmography item V 1, 1 V 2, 2 V 2, 3V 3, 4 w 1, 5w 2, 6w 1, 8w 3, 10w 4, 11 V 1, 1 V 2, 2 w 1, 5w 2, 6 V 3, 4 w 3, 10w 4, 11 An OR-matching graph It is also a weak-matching graph

18 18 title language director name movie date of birth movie actor title filmography item V 1, 1 V 2, 2 V 2, 3V 3, 4 w 1, 5w 2, 6w 1, 8w 3, 10w 4, 11 V 1, 1 V 3, 4 w 3, 10w 4, 11 Another OR-matching graph V 2, 3 w 1, 8 It is not a weak-matching graph since the “director” edge of the query is not preserved

19 19 Matching Graphs Each OR-matching graph represents an OR-matching (and each weak-matching graph represent a weak matching) Each OR-matching graph represents an OR-matching (and each weak-matching graph represent a weak matching) An OR-matching can be represented by many OR-matching graphs, but all these graphs have the same set of nodes and only differ by their edges (and the same it true for weak-matchings and weak-matching graphs) An OR-matching can be represented by many OR-matching graphs, but all these graphs have the same set of nodes and only differ by their edges (and the same it true for weak-matchings and weak-matching graphs) Matching

20 20 Intuition For DAG queries, matching graphs are constructed by adding edges according to the query constraints –The order of the extensions is simply made by using a topological sort of the query nodes For cyclic queries, a simple traversal over the query nor a simple traversal over the database will work –Instead, we use a stratum traversal over the matching graph

21 21 title language director name movie date of birth movie actor title filmography item V 1, 1 V 2, 2 V 2, 3V 3, 4 w 1, 5w 2, 6w 1, 8w 3, 10w 4, 11 Dividing the edges to strata Stratum 1 Stratum 2 Stratum 3 …

22 22 Stratum Traversal A stratum traversal is an ordered list that –Starts with the edges on stratum 1 –Followed by the edges of stratum 2 –… –Followed by the edges of stratum n –… The order of the edges in each stratum is unimportant The order of the edges in each stratum is unimportant There can be multiple occurrences of the same edge in different strata There can be multiple occurrences of the same edge in different strata We only look at the first n strata, where n is the size of the query We only look at the first n strata, where n is the size of the query

23 23 Computing the OR-Matching Graphs A set of OR-matching graphs is created We extend each OR-matching graph in the set by adding edges according to the stratum traversal Initially, the set includes a single graph that consists only the root of the product graph In each extension step, we try to add the current edge to the graphs that were produced so far, and this may cause –The creation of a new graph that replaces the extended graph –The creation of a new graph that is added to the set of graphs in addition to the existing graphs –No change to the set of graphs

24 24 Adding an Edge After each addition of an edge, subsumed matching-graphs are being removed, to avoid exponential blowup There are six cases that should be handled The cases of extending a graph by an edge will be described next

25 25 No change is being done movie V 1, O 1 V 2, O 2 actor V 3, O 4 title V 2, O 2 V 1, O 3 The target of the added edge has a node with a pair that includes the root of Q without the root of D 1 No change is being done movie V 1, O 1 V 2, O 2 actor V 3, O 4 movie V 1, O 1 V 2, O 2 The graph already includes the added edge 2

26 26 No change is being done movie V 1, O 1 V 2, O 2 actor V 3, O 4 title V 2, O 3 W 1, O 8 The graph does not include the source of the added edge 3 movie V 1, O 1 V 2, O 2 actor V 3, O 4 title V 2, O 2 W 1, O 5 The graph includes the source of the added edge and no node with the variable of the target 4 movie V 1, O 1 V 2, O 2 actor V 3, O 4 title W 1, O 5 The edge is added to the graph and the new graph replaces the existing graph

27 27 movie V 1, O 1 V 2, O 2 actor V 3, O 4 The graph already includes the source and the target of the added edge but does not include the added edge itself 5 title W 1, O 3 a.k.a V 2, O 2 W 1, O 3 The edge is added to the graph and the new graph replaces the existing graph a.k.a

28 28 movie V 1, O 1 V 2, O 2 actor V 3, O 4 film V 3, O 4 V 2, O 4 The graph includes the source of the added edge but also includes a node with the same variable as the variable in the target of the added edge 6 title W 1, O 3 Different nodes with the same variable V 2 A new graph is created and being added to the existing graph, without replacing it movie V 1, O 1 V 2, O 2 actor V 3, O 4 title W 1, O 3 movie V 1, O 1 V 2, O 4 actor V 3, O 4 film (V 2,O 2 ) is replaced by (V 2,O 4 ), and nodes that are not reachable from the root are being erased

29 29 Applying the algorithm to the movies example V 1, 1 1 2 movie V 2, 2 V 1, 1 movie V 2, 2 V 1, 1 3 movie V 2, 2 V 1, 1 V 2, 3 movie V 2, 2 V 1, 1 V 2, 3 movie

30 30 4 movie V 2, 2 V 1, 1 V 2, 3 movie actor V 1, 1 V 3, 4 movie V 2, 2 V 1, 1 V 2, 3 movieactor V 3, 4 actor 5 title V 2, 2 w 1, 5 V 3, 4 movie V 2, 2 V 1, 1 V 2, 3 movieactor V 3, 4 actor title w 1, 5 V 3, 4 movie V 2, 2 V 1, 1 V 2, 3 movieactor V 3, 4 actor

31 31 6 language V 2, 2 w 2, 6 title w 1, 5 V 3, 4 movie V 2, 2 V 1, 1 V 2, 3 movieactor V 3, 4 actor language w 2, 6 title w 1, 5 V 3, 4 movie V 2, 2 V 1, 1 V 2, 3 movieactor V 3, 4 actor 7 language w 2, 6 title w 1, 5 V 3, 4 movie V 2, 2 V 1, 1 V 2, 3 movieactor V 3, 4 actor title w 1, 5 V 2, 3 language w 2, 6 title w 1, 5 V 3, 4 movie V 2, 2 V 1, 1 V 2, 3 movieactor V 3, 4 actor title w 1, 5

32 32 language w 2, 6 title w 1, 5 V 3, 4 movie V 2, 2 V 1, 1 V 2, 3 movie actor V 3, 4 actor title w 1, 5 8 name V 3, 4 w 3, 10 name w 3, 10 name w 3, 10 V 3, 4 w 4, 11 date of birth 9 w 4, 11 date of birth w 4, 11

33 33 language w 2, 6 title w 1, 5 V 3, 4 movie V 2, 2 V 1, 1 V 2, 3 movie actor V 3, 4 actor title w 1, 5 10 director V 2, 2 V 3, 4 name w 3, 10 name w 3, 10 date of birth w 4, 11 date of birth w 4, 11 language w 2, 6 title w 1, 5 V 3, 4 movie V 2, 2 V 1, 1 V 2, 3 movie actor V 3, 4 actor title w 1, 5 name w 3, 10 name w 3, 10 date of birth w 4, 11 date of birth w 4, 11 director

34 34 11 filmography item V 3, 4 V 2, 2 language w 2, 6 title w 1, 5 V 3, 4 movie V 2, 2 V 1, 1 V 2, 3 movie actor V 3, 4 actor title w 1, 5 name w 3, 10 name w 3, 10 date of birth w 4, 11 date of birth w 4, 11 title w 1, 5 movie V 2, 2 language w 2, 6 V 3, 4 V 1, 1 V 2, 3 movie actor V 3, 4 actor title w 1, 5 name w 3, 10 name w 3, 10 date of birth w 4, 11 date of birth w 4, 11 filmography item director V 1, 1 V 2, 2V 3, 4 actor name w 3, 10 date of birth w 4, 11 filmography item Subsumed by the left matching graph

35 35 12 filmography item V 3, 4 V 2, 3 V 1, 1 V 2, 3 movie V 3, 4 actor title w 1, 5 name w 3, 10 date of birth w 4, 11 title w 1, 5 movie V 2, 2 language w 2, 6 V 3, 4 V 1, 1 actor name w 3, 10 date of birth w 4, 11 filmography item director title w 1, 5 movie V 2, 2 language w 2, 6 V 3, 4 V 1, 1 V 2, 3 movie actor V 3, 4 actor title w 1, 5 name w 3, 10 name w 3, 10 date of birth w 4, 11 date of birth w 4, 11 filmography item director filmography item V 2, 3 V 3, 4 V 1, 1 actor name w 3, 10 date of birth w 4, 11 filmography item Subsumed by the right matching graph

36 36 title language name movie date of birth movie actor title filmography item V 1, 1 V 2, 2 V 2, 3V 3, 4 w 1, 5w 2, 6w 1, 8w 3, 10w 4, 11 director title w 1, 5 movie V 2, 2 language w 2, 6 V 3, 4 V 1, 1 actor name w 3, 10 date of birth w 4, 11 filmography item director V 1, 1 V 2, 3 movie V 3, 4 actor title w 1, 5 name w 3, 10 date of birth w 4, 11 filmography item The OR-Matchings The Product Graph

37 37 Computing Maximal Weak- Matching Graphs In order to compute maximal weak matching graphs, the same algorithm is being used with a slight change After each addition of an edge the nodes that cause a query constraint not to be preserved are removed (along with edges that contain these nodes) Also, are deleted nodes that the previous deletion causes them not to be reachable from the root

38 38 The Algorithm Computes Weak- Queries in Polynomial Time Theorem Given a query Q and a database D, the revised algorithm terminates with the set of maximal weak-matching graphs of Q w.r.t. D. The runtime of the algorithm is O(q 3 dm 2 ), where q is the size of the query, d is the size of the database and m is the size of the result


Download ppt "1 Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of."

Similar presentations


Ads by Google