Download presentation
Presentation is loading. Please wait.
1
Computing Full Disjunctions
Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of Jerusalem
2
Overview of the Talk OR-semantics and weak semantics for querying incomplete data Complexity of query evaluation Full disjunctions as a special case of weak semantics Generalizing full disjunctions – the join constraints are not restricted to be equality constraints Lower bounds for some related problems
3
Querying Incomplete Data Requires a Special Semantics
Usually, answers to a query are complete assignments of database objects (or values) to the query variables Consequently, partial information is lost For example, dangling tuples are lost when joining several relations The purpose of outerjoins and full disjunctions is to solve this problem, i.e., answers could be partial assignments (to some of the variables)
4
Querying Incomplete Semistructured Data
In semistructured data, incompleteness of data is prevalent OR-semantics and weak semantics were introduced so that queries over semistructured data would return maximal answers rather than complete answers [Kanza, Nutt & Sagiv 1999]
5
In the Semistructured Data Model
Both data and queries are labeled rooted directed graphs Query nodes are variables Database nodes are objects Matchings are assignments of database objects to query variables, such that The database root is assigned to the query root, and Labels are preserved
6
A Semistructured Database About Movies
1 movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 6 1998 1983 English director acted in acted in A Semistructured Database About Movies
7
Under complete semantics, the query
A Query v1 movie actor title director v3 name v2 w3 w1 language date of birth w4 w2 acted in Under complete semantics, the query returns actor-movie pairs, such that the actor played in the movie and was also the director of the movie
8
A complete matching of the query variables to database objects movie
1 1 movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen language 11 11 9 1/12/1935 7 6 6 1998 1983 English director acted in v1 movie actor acted in title director v3 name v2 w3 A complete matching of the query variables to database objects w1 language date of birth w4 w2 acted in
9
Constraints on Complete Matchings
The root constraint is satisfied if the query root is mapped to the database root A query edge is an edge constraint: A query edge with a label l is satisfied if it is mapped to a database edge with the same label l Query Root Database Root r 1 x y 9 11 l
10
Suppose that Node 6 is missing movie actor movie title title name
1 movie actor movie 2 3 4 title title name language 6 English 6 English 5 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 1998 1983 director acted in Suppose that Node 6 is missing acted in
11
An incomplete matching This matching is maximal null movie actor movie
1 1 movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen 11 11 9 1/12/1935 7 1998 1983 director acted in v1 movie actor acted in An incomplete matching title director v3 name v2 w3 w1 language date of birth This matching is maximal w4 w2 null w2 acted in
12
The Reachability Constraint on Partial Matchings
A query node v that is mapped to a database object o satisfies the reachability constraint if there is a path from the query root to v, such that all edge constraints along this path are satisfied Database 1 w y l1 r v l3 l5 1 55 5 8 x z r l2 l4 l6 7 9 1 x z w y l1 r v l3 l2 l5 l4 l6 Query 55
13
Weak Satisfaction of Edge Constraints
An edge constraint is weakly satisfied if it is either Satisfied (as defined earlier), or One (or more) of its nodes is mapped to a null value x y 9 11 l m null x y 9 11 l
14
Weak Matchings A partial matching is a weak matching if
The root constraint is satisfied The reachability constraint is satisfied by every query node that is mapped to a database node Every edge constraint is weakly satisfied
15
A weak matching null movie actor movie title title name year
1 1 movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen 11 11 9 1/12/1935 7 1998 1983 director acted in v1 movie actor acted in title director v3 name v2 A weak matching w3 w1 language date of birth w4 w2 w2 acted in null
16
A Movie Database Consider the case where the director edge is missing
1 movie actor movie 2 3 4 title title name director director 5 8 year date of birth 10 Zelig Antz year Woody Allen 11 9 1/12/1935 7 1998 1983 acted in acted in A Movie Database Consider the case where the director edge is missing
17
An incomplete matching that is not a weak matching
1 1 movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen 11 11 9 1/12/1935 7 1998 1983 acted in v1 movie actor acted in An incomplete matching that is not a weak matching title There is an edge that is not weakly satisfied director v3 name v2 w3 w1 language date of birth w4 w2 w2 acted in null
18
OR Matchings A partial matching is an OR matching if
The root constraint is satisfied The reachability constraint is satisfied by every query node that is mapped to a database node Differently from a weak matching, in an OR Matching, an edge constraint does not have to be weakly satisfied
19
Maximal Matchings Matchings can be represented as tuples (where numbers are object id’s) A matching t1 subsumes a matching t2 if t1 can be obtained from t2 by replacing some nulls in t2 with non-null values A matching is maximal if no other matching subsumes it A query result consists only of maximal matchings t1=(1, 5, 2, null) t2=(1, null, 2, null)
20
More Examples
21
The Movie Database Before the Removals
1 movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 6 1998 1983 English director acted in acted in The Movie Database Before the Removals
22
the actor must be both an actor in the movie and
1 1 In the result, the actor must be both an actor in the movie and the director of the movie movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen language 11 11 9 1/12/1935 7 6 6 1998 1983 English director acted in v1 movie actor acted in title director v3 name v2 w3 w1 language date of birth A complete matching It is also a maximal OR-matching It is also a maximal weak matching w4 w2 acted in
23
In the result, if the actor and the
1 1 In the result, if the actor and the movie are assigned non-null values, then the actor must be both an actor in the movie and the director of the movie movie actor movie 2 3 3 4 title title name 5 8 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 6 1998 1983 English director acted in v1 movie actor acted in null title director v3 name v2 w3 w1 null language date of birth A second maximal weak matching w4 w2 acted in null null
24
the actor either played in the movie,
1 1 In the result, the actor either played in the movie, directed the movie, or is not related at all to the movie movie actor movie 2 3 4 3 4 title title name 5 8 8 year date of birth 10 10 Zelig Antz year Woody Allen language 11 11 9 1/12/1935 7 6 1998 1983 English director acted in v1 movie actor acted in title It is not a weak matching director v3 name v2 w3 w1 language date of birth A maximal OR-matching w4 w2 acted in null
25
Complexity of Evaluating Maximal Weak Matchings and Maximal OR Matchings
26
Data Complexity Under data complexity, the time complexity is a function of the size of the database
27
Two Alternatives for Query Evaluation
A naïve algorithm computes all matchings and then removes subsumed matchings A better algorithm avoids computing all matchings – ideally it only computes maximal matchings Under data complexity, both algorithms are polynomial time
28
Input-Output Complexity
Under input-output complexity, the time complexity is a function of the size of the query, the size of the database, and the size of the result
29
A Naïve Algorithm vs. A Better Algorithm
Under I-O complexity, a naïve algorithm is exponential Is there a better algorithm with a polynomial time I-O complexity? The answer is positive for DAG queries [Kanza, Nutt & Sagiv 1999]
30
Cyclic Queries Theorem: For a query Q and a database D,
the set of all maximal weak matchings can be computed in O(q3dm2) time, where q is the size of the query, d is the size of the database and m is the size of the result (computing all maximal OR matchings has the same complexity)
31
What is the full disjunction of a set of relations?
Full Disjunctions What is the full disjunction of a set of relations? How are full disjunctions related to queries with incomplete answers?
32
Actors-that-Directed The Full Disjunction of the Given Relations
Movies Actors language year title m-id English 1983 Zelig 1 1998 Antz 2 Armageddon 3 1940 Fantasia 4 date-of-birth name a-id 1/12/1935 Woody Allen 1 19/3/1955 Bruce Willis 2 28/10/1967 Julia Roberts 3 Acted-in role m-id a-id Zelig 1 Z 2 Harry 3 Actors-that-Directed m-id a-id 1 The Full Disjunction of the Given Relations role Date-of-birth name a-id language year title m-id Zelig 1/12/1935 Woody Allen 1 English 1983 Z 1998 Antz 2 Harry 19/3/1955 Bruce Willis Armageddon 3 1940 Fantasia 4 28/10/1967 Julia Roberts
33
The Full Disjunction of the Given Relations
Movies language year title m-id English 1983 Zelig 1 1998 Antz 2 Armageddon 3 1940 Fantasia 4 This tuple will not be in the full disjunction role Date-of-birth name a-id language year title m-id English 1983 Zelig 1 The Full Disjunction of the Given Relations role Date-of-birth name a-id language year title m-id Zelig 1/12/1935 Woody Allen 1 English 1983 Z 1998 Antz 2 Harry 19/3/1955 Bruce Willis Armageddon 3 1940 Fantasia 4 28/10/1967 Julia Roberts The full disjunction does not include subsumed tuples
34
Actors-that-Directed The Full Disjunction of the Given Relations
Movies Actors language year title m-id English 1983 Zelig 1 1998 Antz 2 Armageddon 3 1940 Fantasia 4 date-of-birth name a-id 1/12/1935 Woody Allen 1 19/3/1955 Bruce Willis 2 28/10/1967 Julia Roberts 3 Acted-in role m-id a-id Zelig 1 Z 2 Harry 3 Actors-that-Directed m-id a-id 1 The Full Disjunction of the Given Relations role Date-of-birth name a-id language year title m-id Zelig 1/12/1935 Woody Allen 1 English 1983 Z 1998 Antz 2 Harry 19/3/1955 Bruce Willis Armageddon 3 1940 Fantasia 4 28/10/1967 Julia Roberts role Date-of-birth name a-id language year title m-id 28/10/1967 Julia Roberts 3 English 1940 Fantasia 4 The full disjunction does not include tuples that are based on Cartesian Product rather than join
35
In the Full Disjunction of a Given Set of Relations:
Every tuple of the input is a part of at least one tuple of the output Tuples are joined as in a natural join, padded with null values The result includes only “maximal connected portions”
36
Motivation for Full Disjunctions
Full disjunctions have been proposed by Galiando-Legaria as an alternative for outerjoins [SIGMOD’94] Rajaraman and Ullman suggested to use full disjunctions for information integration [PODS’96]
37
Computing Full Disjunctions for γ-acyclic Relation Schemas
Rajaraman and Ullman have shown how to evaluate the full disjunction by a sequence of natural outerjoins when the relation schemas are γ-acyclic Hence, the full disjunction can be computed in polynomial time, under input-output complexity, when the relation schemas are γ-acyclic
38
Weak Semantics Generalizes Full Disjunctions
Relations can be converted into a semistructured database The full disjunction can be expressed as the union of several queries that are evaluated under weak semantics
39
We use colors instead of labels
Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 A node is created for each tuple Edges are added between connected tuples, in both directions A root is added, and edges are added from the root to every node We use colors instead of labels Creating The Database
40
Example r Actors Acted-in Movies
name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 A node is created for each relation schema Edges are added between connected schemas, in both directions The number of queries is equal to the number of schemas In each query, the root is connected to a different schema r Movies Actors Acted-in Creating The Queries
41
Queries are Evaluated under
Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id role name a-id title m-id Zelig Woody Allen 1 Acted-in Actors Movies Queries are Evaluated under Weak Semantics
42
Queries are Evaluated under
Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 role name a-id title m-id Zelig Woody Allen 1 Acted-in Actors Movies Queries are Evaluated under Weak Semantics
43
Queries are Evaluated under
Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 role name a-id title m-id Zelig Woody Allen 1 Acted-in Actors Movies Queries are Evaluated under Weak Semantics
44
Queries are Evaluated under
Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 Julia Roberts null Acted-in null Actors Movies Queries are Evaluated under Weak Semantics
45
Queries are Evaluated under
Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 Julia Roberts role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 Acted-in Actors Movies Queries are Evaluated under Weak Semantics
46
Example r Actors Acted-in Movies r Acted-in Actors Movies name a-id
Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 Julia Roberts Fantasia 4 role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 Julia Roberts role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 r null Acted-in null Actors Movies
47
The Algorithm Computes Full Disjunctions in Polynomial Time Under Input-Output Complexity
Theorem: The full disjunction of relations r1, …, rn can be computed in O(n5s 2f 2) time, where n is the number of relations, s is the total size of all the relations and f is the size of the result
48
Generalizing Full Disjunctions
In a full disjunction, tuples are joined according to equality constraints as in a natural join (or equi-join) We can generalize full disjunctions to support constraints that are not merely equality among attributes
49
Example The date of the historical event is a date in the year when
Movies (m-id, title, year, language, location) Actors (a-id, name, date-of-birth) Acted-in (a-id, m-id, role) Actors-that-Directed (a-id, m-id) The date of the historical event is a date in the year when the movie was released The filming location is near the historical site Historical-Events (name, date, description) Historical-Sites (Country, State, City, Site)
50
The General Idea A set of constraints specifies how tuples should be joined The queries and the database are constructed according to the given constraints A pair of nodes is connected by an edge when it satisfies the corresponding constraint Queries are evaluated w.r.t. the database under weak semantics
51
Another Way of Generalizing Full Disjunctions: Use OR-Semantics
Generate the queries and the database as before, but the queries are evaluated under OR-semantics (rather than weak semantics) This relaxes the requirement that every pair of tuples should be join consistent Instead, a tuple of the full disjunction is only required to be generated by database tuples that form a connected subgraph, but need not be pairwise join consistent
52
Example Employee: (007, James Bond, London, 6)
Employees (e-id, ename, city, dept-no) Departments (dept-no, dname, building) Located-in (building, city, street) Employee: (007, James Bond, London, 6) Department: (6, MI-6, 10) Located-in: (10, Liverpool, King) street city building dname dept -no ename e-id 10 MI-6 6 London James Bond 007 King Liverpool The Full Disjunction
53
The Full Disjunction under OR-Semantics
Example Employees (e-id, ename, city, dept-no) Departments (dept-no, dname, building) Located-in (building, city, street) Employee: (007, James Bond, London, 6) Department: (6, MI-6, 10) Located-in: (10, Liverpool, King) street city building dname dept -no ename e-id King Liverpool 10 MI-6 6 London James Bond 007 The Full Disjunction under OR-Semantics
54
Two Related Problems The Projection Problem: Computing the projection of the full disjunction on a given set of attributes The Restriction Problem: Computing only those tuples of the full disjunction that are non-null on a given set of attributes The projection problem and the restriction problem cannot be computed in polynomial time (under input-output complexity) unless P=NP
55
Conclusion Cyclic queries can be computed in polynomial time (in the size of the query, the database and the result) under either OR-semantics or weak semantics A reduction of full-disjunction evaluation to query evaluation under weak semantics is described Using the reduction, full disjunctions can be computed in polynomial time (in the size of the relation schemas, the relations and the result)
56
Conclusion (continued)
Full disjunctions can be generalized in two ways By using OR-semantics instead of weak semantics By joining tuples according to general constraints Generalized full disjunctions can be useful in the context of data integration from heterogeneous sources The projection problem and the restriction problem have polynomial-time algorithms (under input-output complexity) when the relations have γ-acyclic schemas, but not in the general case
57
Thank You Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.