Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS240A: Databases and Knowledge Bases Recursion and Stratification

Similar presentations


Presentation on theme: "CS240A: Databases and Knowledge Bases Recursion and Stratification"— Presentation transcript:

1 CS240A: Databases and Knowledge Bases Recursion and Stratification
Notes From Chapter 8 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari. Morgan Kaufmann, 1997 Carlo Zaniolo Department of Computer Science University of California, Los Angeles

2 Transitive Closure Queries
Transitive closure of the graph: arc(X, Y)   path(X,Y) ¬ arc(X,Y). path(X,Z) ¬ arc(X,Y), path(Y, Z). Transitive Closure of the graph: arc(X, Y)   path(X,Z) ¬ path(X,Y), arc(Y, Z). Transitive Closure of the graph: arc(X, Y) path(X,Z) ¬ path(X,Y), path(Y, Z).

3 Relational Tables for a BoM application
PART_COST BASIC_PART SUPPLIER COST TIME top_tube cinelli 20.00 14 columbus 15.00 6 down_tube 10.00 head_tube seat_mast seat_stay chain_stay fork 40.00 30.00 spoke campagnolo 0.60 15 nipple mavic 0.10 3 hub 31.00 5 suntour 18.00 rim 50.00 araya 7.00 1 ASSEMBLY PART SUBPART QTY bike frame 1 wheel 2 top_tube down_tube head_tube seat_mast seat_stay chain_stay fork spoke 36 nipple rim hub tire

4 Subparts All subparts: a transitive-closure query
all_subparts(Part, Sub) ¬ assembly(Part, Sub, _). all_subparts(Part, Sub2) ¬ all_subparts(Part, Sub1), assembly(Sub1, Sub2, _ ). For each part, basic or otherwise, find its basic subparts. A basic part is a subpart of itself : basic_subparts(BasicP, BasicP) ¬ part_cost(BasicP,_ , _, _). basic_subparts(Prt, BasicP) ¬ assembly(Prt, SubP, _), basic_subparts(SubP, BasicP).

5 Negation and Recursion
For each basic part find the least time needed for delivery   fastest(Part, Time) ¬ part_cost(Part, Sup1,_,Time),   Øfaster(Part, Time). faster(Part, Time) ¬ part_cost(Part, _, _, Time), part_cost(Part,_,_,Time1), Time1<Time.  

6 Negation and Rercusion (cont.)
Times required for basic subparts of the given assembly timeForbasic(AssPart, BasicSub, Time) ¬ basic_subparts(AssPart,BasicSub),   fastest(BasicSub, Time).   The maximum time required for basic subparts of the given assembly howsoon(AssPart, Time) ¬ timeForbasic(AssPart, _, Time), Ølarger(AssPart, Time). larger(Part, Time) ¬ timeForbasic(Part,_, Time), timeForbasic(Part,_, Time1), Time1 > Time. Note: to compute howsoon you must first compute larger completely.

7 Predicate Dependency Graph
The Predicate Dependency Graph for a program P is a graph having as nodes the names of the predicates in P. The graph contains an arc a ® b if there exists a rule with goal name a and head-name b. If the goal is negated then the arc is marked as a negative arc. The nodes and arcs of the strong components of pdg(P), respectively, identify the recursive predicates and recursive rules of P. A program is said to be stratifiable when none of its negative arcs belongs to a strong component.o Programs which are stratifiable, have a clear meaning; non-stratifiable programs are often ill-defined from a semantic viewpoint.

8 PDG for howsoon   howsoon larger timeForbasic fastest basic_subpart
faster assembly part_cost

9 Stratification By sorting on pdg(P), the nodes of P can partitioned into a finite set of n strata 1, ... , n, such that, for each rule r Î P, the predicate-name, of the head of r belongs to a stratum that is ³ to each stratum containing some positive goal, and also is strictly > than each stratum containing some negated goal. A stratification of a program will be called strict every stratum either contains a single predicate or a set of predicates that are mutually recursive.

10 One-at-the-Time Computations: needed for aggregates
Set aggregates, such as count or sum, in SQL, require that the element of a set be visited one-at-the-time. (These aggregates also require arithmetic predicates, that we will consider later.) Counting the elements in a set modulo an integer does not require arithmetic, but still requires the elements of the set be visited one-at-the-time. The parity query: how many tuples in the base relation br(X)–an even number of an odd number?

11 The parity query: how many tuples in the base relation br
between(X, Z) ¬ br(X), br(Y), br(Z), X<Y, Y <Z. next(X, Y) ¬ br(X), br(Y), X < Y Øbetween(X,Y). next(nil, X) ¬ br(X), Øsmaller(X). smaller(X) ¬ br(X), br(Y), Y < X. even(nil) even(Y) ¬ odd(X), next(X, Y). odd(Y) ¬ even(X), next(X, Y). br_is_even ¬ even(X), Ønext(X,Y). next sorts the elements of br into an ascending chain, where the first link of the chain connects the distinguished node nil to the least element in br (third rule in the example). This works only if br is totally ordered.

12 Expressive Power Data Complexity: query languages are viewed as mappings from the DB to the answer. The big O is evaluated in terms of the size of the database, which is always finite. DB-PTIME: Polynomial Data Complexity w.r.t. DB size 1.   Use Turing machines as the general model of computation and encode the database as a tape of length n 2.   Then any computable function on the database can be encoded as a Turing machine 3.   some of these machines halt (complete their computation), in O(n) steps, others in an an exponential number of steps, others never terminate. 4.   The set machines that halt in a number of steps which is polynomial in n defines the class of DB-PTIME functions. Number of tuples, data-items, bytes: what DB size are we talking about?

13 Polynomial Data Complexity
Are relational algebra expressions evaluable in DB-PTIME? Yes, and actually we use indices and query optimizers to keep exponents and coefficient small. But these languages cannot express DB-PTIME. For instance they cannot express transitive closures, or aggregates (thus the most frequently used aggregates were added to SQL in ad hoc fashion).

14 The Expressive Power Hierarchy
Safe, stratified Datalog programs Can still be computed in polynomial time expresses every DB-PTIME query; thus They DB-PTIME complete. But this is only true if we assume that there exists a total order in the databases Desiderata: Order-independence property of queries (genericity); I.e., queries insensitive to constant renaming. To express all DB-PTIME queries under genericity, a non-deterministic construct such as choice is needed (subject covered in ADS book, but not in CS240A) DB-PTIME completeness is well below Turing completeness—for that you need and infinite universe.

15 Functors and Complex Terms
Flat parts, their number, shape and weight by: part( Part#,  Shape ,  Weight)   part(202, circle(11), actualkg(0.034)). part(121, rectangle(10, 20), unitkg(2.1)).   part_weight(No,Kilos)¬ part(No,_, actualkg(Kilos)). part_weight(No, Kilos)¬ part(No, Shape,unitkg(K)), area(Shape, Area), Kilos= K * Area.   area(circle(Dmtr), A) ¬ A = Dmtr * Dmtr * 3.14/4. area(rectangle(Base, Height), A) ¬ A = Base*Height. The complex terms circle(11), actualkg(34), rectangle(10, 20), and unitkg(2.1) are in logical parlance called functions (A functor followed by a list of arguments in parentheses).

16 Functors (cont.) In actual applications, these complex terms do not represent functions to be evaluated; they are instead used as variable length sub-records. Thus, circle(11) and rectangle(10, 20), respectively, denote that the shape of our first part is a circle with diameter 20 cm, while the shape of the second part is a rectangle with base 10 cm and height 20 cm. Any number of sub-arguments is allowed in such complex terms, recursively. Objects of arbitrary complexity, including solid objects, can be nested and represented in this fashion. Functors are then used as case discriminants.

17 Lists [] is the empty list. [Head| Tail] represents a non-empty list.
Example: [mary, mike, seattle] Is a shorthand for: [mary,[mike, [seattle,[ ]]]] A list-based representation for suppliers of top_tube: part_sup_list(top_tube,[cinelli,columbus,mavic]). Lists are only syntactic sugaring for a particular function symbol.

18 Normalizing a nested relation into a flat relation
flatten(P, S, L) ¬ part_sup_list(P, [S | L]). flatten(P, S, L) ¬ flatten(P, _, [S | L]). ps(Part, Sup) ¬ flatten(Part, Sup, _). This program applied to the previous fact yields. ps(top_tube, cinelli) ps(top_tube, columbus) ps(top_tube, mavic)

19 How to Reconstruct the Nested Relation
between(P, X, Z) ¬ ps(P, X), ps(P, Y), ps(P, Z), X < Y, Y < Z. smaller(P, X) ¬ ps(P, X), ps(P, Y), Y < X. nested(P, [X]) ¬ ps(P, X), Øsmaller(P, X). nested(P, [Y|[X|W]]) ¬ nested(P, [X|W]), ps(P, Y), X < Y,   Øbetween(P, X,Y). ps_nested(P, W) ¬ nested(P, W), Ønested(P, [X|W]).

20 Conclusion Recursion and stratified negation assure DB-PTIME completeness for Datalog Practical systems such as LogicBlock and DeAL also support function symbols, arithmetic expressions, and many other constructs, which make the language Turing-complete.


Download ppt "CS240A: Databases and Knowledge Bases Recursion and Stratification"

Similar presentations


Ads by Google