CS240A: Databases and Knowledge Bases Recursion and Stratification Carlo Zaniolo Department of Computer Science University of California, Los Angeles December, Notes From Chapter 8 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari. Morgan Kaufmann, 1997
Transitive Closure Queries Transitive closure of the graph: arc(X, Y) path(X,Y) arc(X,Y). path(X,Z) arc(X,Y), path(Y, Z). Transitive Closure of the graph: arc(X, Y) path(X,Y) arc(X,Y). path(X,Z) path(X,Y), arc(Y, Z). Transitive Closure of the graph: arc(X, Y) path(X,Y) arc(X,Y). path(X,Z) path(X,Y), path(Y, Z).
Relational Tables for a BoM application PART_COST BASIC_PARTSUPPLIERCOSTTIME top_tubecinelli top_tubecolumbus down_tubecolumbus head_tubecinelli head_tubecolumbus seat_mastcinelli seat_mastcinelli seat_staycinelli seat_staycolumbus chain_staycolumbus forkcinelli forkcolumbus spokecampagnolo nipplemavic0.103 hubcampagnolo hubsuntour rimmavic rimaraya7.001 ASSEMBLY PARTSUBPARTQTY bikeframe1 bikewheel2 frametop_tube1 framedown_tube1 framehead_tube1 frameseat_mast1 frameseat_stay2 framechain_stay2 framefork1 wheelspoke36 wheelnipple36 wheelrim1 wheelhub1 wheeltire1
Subparts All subparts: a transitive-closure query all_subparts(Part, Sub) assembly(Part, Sub, _). all_subparts(Part, Sub2) all_subparts(Part, Sub1), assembly(Sub1, Sub2, _ ). For each part, basic or otherwise, find its basic subparts. A basic part is a subpart of itself : basic_subparts(BasicP, BasicP) part_cost(BasicP,_, _, _). basic_subparts(Prt, BasicP) assembly(Prt, SubP, _), basic_subparts(SubP, BasicP).
Negation and Recursion For each basic part find the least time needed for delivery fastest(Part, Time) part_cost(Part, Sup1,Cost,Time), faster(Part, Time). faster(Part, Time) part_cost(Part, Sup2, Cost, Time), part_cost(Part,Sup1,Cost,Time1), Time1<Time.
Negation and Rercusion (cont.) Times required for basic subparts of the given assembly timeForbasic(AssPart, BasicSub, Time) basic_subparts(AssPart,BasicSub), fastest(BasicSub, Time). The maximum time required for basic subparts of the given assembly howsoon(AssPart, Time) timeForbasic(AssPart, _, Time), larger(AssPart, Time). larger(Part, Time) timeForbasic(Part, _, Time), timeForbasic(Part, _, Time1), Time1 > Time. Note: to compute howsoon you must first compute larger completely.
Predicate Dependency Graph The Predicate Dependency Graph for a program P is a graph having as nodes the names of the predicates in P. The graph contains an arc a b if there exists a rule with goal name a and head-name b. If the goal is negated then the arc is marked as a negative arc. The nodes and arcs of the strong components of pdg(P), respectively, identify the recursive predicates and recursive rules of P. A program is said to be stratifiable when none of its negative arcs belongs to a strong component.o Programs which are stratifiable, have a clear meaning; non-stratifiable programs are often ill-defined from a semantic viewpoint.
PCG for howsoon howsoon larger timeForbasic assembly basic_subpartfaster part_cost fastest
Stratification By sorting on pdg(P), the nodes of P can partitioned into a finite set of n strata 1,..., n, such that, for each rule r P, the predicate-name, of the head of r belongs to a stratum that is to each stratum containing some positive goal, and also is strictly > than each stratum containing some negated goal. A stratification of a program will be called strict every stratum either contains a single predicate or a set of predicates that are mutually recursive.
One-at-the-Time Computations: needed for aggregates Set aggregates, such as count or sum, in SQL, require that the element of a set be visited one-at-the-time. (These aggregates also require arithmetic predicates, that we will consider later.) Counting the elements in a set modulo an integer does not require arithmetic, but still requires the elements of the set be visited one-at-the-time. The parity query: how many tuples in the base relation br(X) – an even number of an odd number?
The parity query: how many tuples in the base relation br between(X, Z) br(X), br(Y), br(Z), X<Y, Y <Z. next(X, Y) br(X), br(Y), X < Y between(X,Y). next(nil, X) br(X), smaller(X). smaller(X) br(X), br(Y), Y < X. even(nil). even(Y) odd(X), next(X, Y). odd(Y) even(X), next(X, Y). br_is_even even(X), next(X,Y). next sorts the elements of br into an ascending chain, where the first link of the chain connects the distinguished node nil to the least element in br (third rule in the example). This works only if br is totally ordered.
Expressive Power Data Complexity: query languages are viewed as mappings from the DB to the answer. The big O is evaluated in terms of the size of the database, which is always finite. DB-PTIME: Polynomial Data Complexity w.r.t. DB size 1. Use Turing machines as the general model of computation and encode the database as a tape of length n 2. Then any computable function on the database can be encoded as a Turing machine 3. some of these machines halt (complete their computation), in O(n) steps, others in an an exponential number of steps, others never terminate. 4. The set machines that halt in a number of steps which is polynomial in n defines the class of DB-PTIME functions. Number of tuples, data-items, bytes: what DB size are we talking about?
Polynomial Data Complexity Are relational algebra expressions evaluable in DB- PTIME? Yes, and actually we use indices and query optimizers to keep exponents and coefficient small. But these languages cannot express DB-PTIME. For instance they cannot express transitive closures, or aggregates (thus the most frequently used aggregates were added to SQL in ad hoc fashion).
The Expressive Power Hierarchy Safe, stratified Datalog programs Can still be computed in polynomial time expresses every DB-PTIME query; thus They DB-PTIME complete. But this is only true if we assume that there exists a total order in the databases Desiderata: Order-independence property of queries (genericity); I.e., queries insensitive to constant renaming. To express all DB-PTIME queries under genericity, a non-deterministic construct such as choice is needed (subject covered in ADS book, but not in CS240A) DB-PTIME completeness is well below Turing completeness—for that you need and infinite universe.
Functors and Complex Terms Flat parts, their number, shape and weight, following the schema: part( Part#, Shape, Weight) part(202, circle(11), actualkg(0.034)). part(121, rectangle(10, 20), unitkg(2.1)). part_weight(No, Kilos ) part(No,, actualkg(Kilos)). part_weight(No, Kilos ) part(No, Shape, unitkg(K)), area(Shape, Area), Kilos = K * Area. area(circle(Dmtr), A) A = Dmtr * Dmtr * 3.14/4. area(rectangle(Base, Height), A) A = Base*Height. The complex terms circle(11), actualkg(34), rectangle(10, 20), and unitkg(2.1) are in logical parlance called functions (A functor followed by a list of arguments in parentheses).
Functors (cont.) In actual applications, these complex terms do not represent functions to be evaluated; they are instead used as variable length sub-records. Thus, circle(11) and rectangle(10, 20), respectively, denote that the shape of our first part is a circle with diameter 20 cm, while the shape of the second part is a rectangle with base 10 cm and height 20 cm. Any number of sub-arguments is allowed in such complex terms, recursively. Objects of arbitrary complexity, including solid objects, can be nested and represented in this fashion. Functors are then used as case discriminants.
Lists [] is the empty list. [Head Tail] represents a non-empty list. Example : [mary, mike, seattle] Is a shorthand for: [mary,[mike, [seattle,[ ]]]] A list-based representation for suppliers of top_tube: part_sup_list(top_tube,[cinelli,columbus,mavic]). Lists are only syntactic sugaring for a particular function symbol.
Normalizing a nested relation into a flat relation flatten(P, S, L) part_sup_list(P, [S L]). flatten(P, S, L) flatten(P, _, [S L]). ps(Part, Sup) flatten(Part, Sup, _). This program applied to the previous fact yields. ps(top_tube, cinelli) ps(top_tube, columbus) ps(top_tube, mavic)
How to Reconstruct the Nested Relation between(P, X, Z) ps(P, X), ps(P, Y), ps(P, Z), X < Y, Y < Z. smaller(P, X) ps(P, X), ps(P, Y), Y < X. nested(P, [X]) ps(P, X), smaller(P, X). nested(P, [Y [X W]]) nested(P, [X W]), ps(P, Y), X < Y, between(P, X,Y). ps_nested(P, W) nested(P, W), nested(P, [X W]).
Conclusion Recursion and stratified negation assure DB- PTIME completeness for Datalog Practical systems such as LDL++ also support function symbols, arithmetic expressions, and many other constructs.