Download presentation
Presentation is loading. Please wait.
1
CS240A: Databases and Knowledge Bases Carlo Zaniolo Department of Computer Science University of California, Los Angeles December, 2001 Notes From Textbook Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari Morgan Kaufmann, 1997
2
A relational DB about students and courses Name CourseGrade 'Joe Doe' cs1232.7 'Jim Jones' cs1013.0 'Jim Jones ' cs1433.3 'Jim Black' cs1433.3 'Jim Black'cs101 2.7 NameMajorYear 'Joe Doe'cssenior 'Jim Jones'csjunior 'Jim Black'eejunior took student('Joe Doe', cs, senior). student('Jim Jones', cs, junior). student('Jim Black', ee, junior). took('Joe Doe', cs123, 2.7). took('Jim Jones', cs101, 3.0). took('Jim Jones', cs143, 3.3). took('Jim Black', cs143, 3.3). took('Jim Black', cs101, 2.7) The same fact base for Datalog --------------------------------------- student
3
Rules How to express logical conjunction: Find the name of junior-level students who have taken both cs101 and cs143 firstreq(Name) student(Name, Major, junior), took(Name, cs101, Grade1), took(Name, cs143, Grade2). Rule head, rule body. Upper case, lower case, anonymous variables. The commas in the body represent logical conjunction.
4
Disjunction Junior-level students who took course cs131 or course cs151 with grade better than 3.0 scndreq(Name) took(Name, cs131, Grade), Grade > 3.0, student(Name, Major, junior). scndreq(Name) took(Name, cs151, Grade), Grade > 3.0, student(Name, _, junior ).
5
QUERIES A closed query; the answer to such query is either yes or not. For instance, ? firstreq( ' Jim Black ' ) An open query: ?firstreq(X) and its answer: firstreq( ' Jim Jones ' ) firstreq( ' Jim Black ' ) Much power and convenience in cascading!!! E.g., Both requirements must be satisfied to enroll in cs298 reqcs298(Name) firstreq(Name), scndreq(Name).
6
The Relational Model versus Datalog Terminology Relational Model Table or Relation Row or Tuple Column View Datalog Base Predicate Fact Argument Derived Predicate
7
Negation in Datalog Only goals can be negated. Negated heads are not allowed! Junior-level Students who did not take course cs143 hastaken(Name, Course) took(Name, Course, Grade). lacks_cs143(Name) student(Name, _, junior), hastaken(Name, cs143).
8
Universal Quantification by Double Negation Find the senior students who completed all the requirements for the cs major: ?all_req_sat(X) The first step is that of formulating the complementary query: Find students who did not take some of the courses required for a cs major. We can now re-express the original query as: Find the senior students who are NOT missing any requirement req_missing(Name) student(Name,_,senior ), req(cs, Course), hastaken(Name, Course). all_req_sat(Name) student(Name, _, senior), req_missing(Name).
9
Domain Relational Calculus Relational calculus comes in two main flavors: 1. in the Domain Relational Calculus (DRC) the variables denote values of attributes, 2. in the Tuple Relational Calculus (TRC) variables denote whole tuples. In DRC, the query ``Find the name of junior-level students who have taken both cs101 and cs143'‘ { (N) G 1 (took(N, cs101, G 1 )) G 2 (took(N, cs143, G 2 )) M student(N, M, junior)) }
10
Domain Relational Calculus (cont.) The query ? scndreq(N) can be expressed as follows: { (N) G, M(took(N, cs131, G) G >3.0 student(N, M, junior)) G, M (took(N, cs151, G) G >3.0 student(N, M, junior)) } DRC presents several syntactic differences w.r.t. Datalog: set-definition by abstraction (rather than rules) conjunctions and disjunctions in the same formula, nesting of parentheses, and explicit quantifiers.
11
Explicit Quantifiers Existential and universal quantification are both allowed in DRC. A query such as ?all_req_sat(N) can be expressed either by using double negation (and only existential quantifiers) or directly using the universal quantifier: Example: Find the seniors who completed all cs requirements: { (N) M (student(N, M, senior)) C (req(cs, C) G (took(N, C, G)) } The implication sign: p q is a shorthand for p q.
12
Tuple Relational Calculus (TRC) In TRC, variables range over the tuples of a relation. For instance, the TRC expression for the query ?firstreq(N) is: { (t[1]) u s (took(t) took(u) student(s) t[2] = cs101 u[2] = cs143 t[1] = u[1] s[3] = junior s[1] = t[1] ) } The variables t and s, respectively denote tuples ranging over took and student. t[1] denotes the first component in t (corresponding to Name ); TRC requires an explicit statement of equality (e.g., s[1] = t[1]), while in DRC equality is denoted implicitly by the presence of the same variable in different places.
13
Relational DB Languages The various languages are quite different, but they have the same expressive power Safe TRC and DRC expressions are equivalent, and there are mappings that transform any formula in one language into an equivalent one in the other. For each TRC or DRC formula there is an equivalent, nonrecursive Datalog program. The converse is also true, since a nonrecursive Datalog program can be mapped into an equivalent DRC query. Another language equivalent to these, is relational algebra (RA). RA is an operator-based language, and thus provides a useful link to concrete implementation of these logic-based languages. Languages that can express every query expressible in these languages are called relational complete. Relational completeness is necessary but it is much less than Turing Completeness & no longer sufficient in the commercial world (ergo the mission of CS240A)
14
Commercial DB Languages The actual query languages of commercial RDMS are largely based on the formal query languages just discussed. For instance: Query-By-Example (QBE) is a visual query language based on DRC Languages such as QUEL and SQL are instead based on TRC. In QUEL and SQL, the notation t.Name and t.Course are used instead of t[1] and t[2]; also existential quantification is (resp.) replaced by the constructs RANGE and FROM. RA algebra provides a good basis for the efficient implementation of these relational languages.
15
Relational Algebra (RA) A family of operators on relations that have the closure property: take relations as arguments and return relations as result. Union. The union of relations R and S, denoted R S, is the set of tuples that are in R, or in S, or in both. R S = { t t R t S } This operation is defined only if R and S have the same number of columns. Set difference Tuples tha belong to R but not to S. R - S = { t t R r (r S t = r) } This operation is defined only if R and S have the same number of columns. Say that that number is n. Then: t=r denotes that t[1] = r[1] t[n] = r[n]).
16
Relational Operators Cartesian product. R×S ={ t ( r R ) ( s S) (t[1, , n]=r t[n+1, , n+m]=s)} If R has n columns and S has m columns, then R ×S contains all the possible m+n tuples whose first m components form a tuple in R and the last n components form a tuple in S. Thus, R ×S has m+n columns and R × S tuples, where R and S denote the respective cardinalities of the two relations. Projection. Let L1 be a sub list of the columns of R (with possible reordering): L1 R = { r[L ] r R }
17
Relational Operators (cont) Selection. F R denotes the selection on R according to the selection formula F, where F obeys one of the following patterns: $i C, where i is a column of R, is an arithmetic comparison operator, and C is a constant, or $i $j, where $i and $j are columns of R, and is an arithmetic comparison operator, or an expression built from terms such as those described in (i) and (ii), above, and the logical connectives , , and . Then: F R = { t t R F} where F denotes the formula obtained from F by replacing $i and $j with t[i] and t[j]. For example, if F is ``$2 = $3 $1 = bob'', then F is ``t[2] = t[3] t[1] = bob''. Thus: $2 = $3 $1 = bob R = { t t R t[2] = t[3] t[1] = bob }. All previous operators, but set-difference, are monotonic.
18
Additional Operators Addditional operators of frequent use can be derived from these. For instance, we have join, semijoin, intersection, division and generalized projection. The join operator: R S, can be constructed using Cartesian product and selection. R S = F ( R × S) where F = $i 1 1 $j 1 i k k $j k ; i 1, , i k are columns of R; j 1, , i k are columns of S; and 1, , k are comparison operators. Then, if R has arity m, we define F = $i 1 1 $(j 1 +m ) $i k k $(j k +m ).
19
Additional Operators (cont.) The intersection of two relations can be constructed either by taking the equijoin of the two relations in every column (and then projecting out duplicate columns) or by using the following property: R S = R-(R-S) = S-(S-R). The generalized projection of a relation R is denoted L (R), where L is a list of column numbers and constants. Unlike ordinary projection, components might appear more than once, and constants as components of the list L are permitted e.g., $1,c,$1 is a valid generalized projection
20
Unsafe Rules An unsafe Rule: to find grades better than the grade Joe Doe got in cs143, a user might write bettergrade(G1) took(‘Joe Doe’, cs143, G), G1 > G. In finite answers. Assuming that, say Joe Doe got the grade of 3.3 (i.e., B+) in course cs143, then, there are infinitely many numbers that satisfy the conditions of being greater than 3.3. Lack of domain independence. A query formula is said to be domain independent when its answer only depends on the database and the constants in the query, but not on the domain of interpretation. The set of values for G1 satisfying the rule above depends on what domain we assume for numbers: e.g., integer, rational or real. No relational algebra equivalent. RA expression take DB tables and constant as operands and return finite relations.
21
Safety In practical languages, it is desirable to allow only safe formulas, which avoid the problems of infinite answers, and loss of domain independence. But the problems of domain independence and finiteness of answers are undecidable even for non-recursive queries. Therefore, necessary and sufficient syntactic conditions that characterize safe formulas cannot be given in general. In practice, therefore, sufficient conditions are defined that might be a more restrictive than necessary.
22
Safe Datalog: an inductive definition 1. Safe Predicates. A predicate q of P is safe if (i) q is a database predicate, or (ii) every rule defining q is safe 2. Safe Variables. A variable X in rule r is safe if (i) X is contained in some positive goal q(t 1,..., t n ), where the predicate q(A 1,..., A n ) is safe, or (ii) r contains some equality goal X = Y, where Y is safe. 3. Safe Rules. A rule r is safe if all its variables are safe 4. The goal ?q(t 1,..., t n ) is safe when the predicate q(A 1,..., A n ) is safe.
23
From Safe Rules to RA [ Step 1] P is transformed into an equivalent program P that does not contain any equality goal by replacing equals with equals and removing the equality goals. For example: s(Z,b,W) q(X,X,Y),p(Y,Z,a), W=Z, W > 24.3. Is translated into: s(Z,b,Z) q(X,X,Y), p(Y,Z,a), Z > 24.3.
24
Mapping [Step 2] The body of r is translated into the RA expression Body r Body r is the cartesian product of all (base or derived) relations in the body, followed by a selection F, where F is the conjunction of the following conditions: (i) inequality for each such goal (e.g., Z > 24.3), (ii) equality between columns containing the same variables (iii) equality between a column and the constant therein, e.g. s(Z,b,Z) q(X,X,Y), p(Y,Z,a), Z > 24.3. (i) Z > 24.3 translates into the selection condition $5 > 24.3, (ii) the two occurrences of X translates into $1 = $2, while the the two Ys maps into $3 = $4, and (iii) the constant in the last column of P maps into $6 = a. Thus : Body r = $1 = $2, $3 = $4, $6 = a, $5 > 24.3 (Q ×P)
25
Mapping Datalog to RA [Steps 3 & 4] Step 3: Each rule r is translated into an extended projection on Body r, according to the patterns in the head of r. For the rule at hand s(Z,b,Z) q(X,X,Y),p(Y,Z,a),Z > 24.3. we obtain: S = $5, b, $5 ( Body r ) Step 4: Multiple rules with the same head are translated into the union or their equivalent expressions.
26
Equivalence of RA and Safe Nonrecursive Datalog Programs Negated Goals: A little more complex--- see homework! Equivalence of RA and Safe Nonrecursive Datalog Theorem: Let P be a safe Datalog program without recursion or function symbols. Then, for each predicate in P, there exists an equivalent relational algebra expression.
27
Relational Tables for a BoM application PART_COST BASIC_PARTSUPPLIERCOSTTIME top_tubecinelli20.0014 top_tubecolumbus15.006 down_tubecolumbus10.006 head_tubecinelli20.0014 head_tubecolumbus15.006 seat_mastcinelli20.006 seat_mastcinelli15.0014 seat_staycinelli15.0014 seat_staycolumbus10.006 chain_staycolumbus10.006 forkcinelli40.0014 forkcolumbus30.006 spokecampagnolo0.6015 nipplemavic0.103 hubcampagnolo31.005 hubsuntour18.0014 rimmavic50.003 rimaraya7.001 ASSEMBLY PART SUBPARTQTY bikeframe1 bikewheel2 frametop_tube1 framedown_tube 1 framehead_tube1 frameseat_mast1 frameseat_stay2 framechain_stay2 framefork1 wheelspoke36 wheelnipple 1 wheelrim1 wheelhub1 wheeltire1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.