CS240A: Databases and Knowledge Bases Recursion and Stratification

Slides:



Advertisements
Similar presentations
Chapter 5: Other Relational Languages
Advertisements

1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Part VI NP-Hardness. Lecture 23 Whats NP? Hard Problems.
Relational Calculus and Datalog
Lecture 24 MAS 714 Hartmut Klauck
CS240A: Databases and Knowledge Bases From Deductive Rules to Active Rules Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Ver 1,12/09/2012Kode :CCs 111,Sistem basis DataFASILKOM Chapter 5: Other Relational Languages Database System Concepts, 5th Ed. ©Silberschatz, Korth and.
The Theory of NP-Completeness
1 Recursive SQL, Deductive Databases, Query Evaluation Book Chapter of Ramankrishnan and Gehrke DBMS Systems, 3 rd ed.
1 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Deductive Databases Chapter 25.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Complexity 11-1 Complexity Andrei Bulatov Space Complexity.
CS240A: Databases and Knowledge Bases Time Ontology and Representations Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.
CS240A: Databases and Knowledge Bases Fixpoint Semantics of Datalog Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
CS 240A: Databases and Knowledge Bases Analysis of Active Databases Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
CS240A: Databases and Knowledge Bases Recursion and Stratification Carlo Zaniolo Department of Computer Science University of California, Los Angeles December,
Mid-term Class Review.
CS240A: Databases and Knowledge Bases Time Ontology and Representations Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Cs5611 Recursive SQL, Deductive Databases, Query Evaluation Slides based on book chapter, By Ramankrishnan and Gehrke DBMS Systems, 3 rd ed.
Deductive Databases Chapter 25
Abstract State Machines and Computationally Complete Query Languages Andreas Blass,U Michigan Yuri Gurevich,Microsoft Research & U Michigan Jan Van den.
Section 11.4 Language Classes Based On Randomization
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Randomized Turing Machines
The Relational Model: Relational Calculus
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
CS 103 Discrete Structures Lecture 10 Basic Structures: Sets (1)
Chapter 1, Part II: Predicate Logic With Question/Answer Animations.
CS240A: Databases and Knowledge Bases Recursive Queries in SQL 1999 Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
MA/CSSE 474 Theory of Computation Decision Problems DFSMs.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
1 Turing’s Thesis. 2 Turing’s thesis: Any computation carried out by mechanical means can be performed by a Turing Machine (1930)
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
CS 240A: Databases and Knowledge Bases Analysis of Active Databases Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
CS240A: Databases and Knowledge Bases Recursive Queries in SQL 2003 Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Copyright © Curt Hill Other Trees Applications of the Tree Structure.
CS240A: Databases and Knowledge Bases TSQL2 Carlo Zaniolo Department of Computer Science University of California, Los Angeles Notes From Chapter 6 of.
Operational Semantics Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Extensions of Datalog Wednesday, February 13, 2001.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
1 Finite Model Theory Lecture 9 Logics and Complexity Classes (cont’d)
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesTuring.
Chapter 4 Introduction to Set Theory
CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides.
Chapter 12: Theory of Computation
CS240A: Databases and Knowledge Bases Introduction
Part VI NP-Hardness.
Goal for this lecture Demonstrate how we can prove that one query language is more expressive than (i.e., “contained in” as described in the book) another.
Modeling Arithmetic, Computation, and Languages
Semantics of Datalog With Negation
ICS 353: Design and Analysis of Algorithms
How Hard Can It Be?.
Intractable Problems Time-Bounded Turing Machines Classes P and NP
Decidability Turing Machines Coded as Binary Strings
Decidability Turing Machines Coded as Binary Strings
Introduction to Finite Automata
Lecture 10: Query Complexity
Logic Based Query Languages
NP-Completeness Reference: Computers and Intractability: A Guide to the Theory of NP-Completeness by Garey and Johnson, W.H. Freeman and Company, 1979.
Datalog Inspired by the impedance mismatch in relational databases.
CS240A: Databases and Knowledge Bases TSQL2
Rules Programs Negation
Presentation transcript:

CS240A: Databases and Knowledge Bases Recursion and Stratification Notes From Chapter 8 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari. Morgan Kaufmann, 1997 Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Transitive Closure Queries Transitive closure of the graph: arc(X, Y)   path(X,Y) ¬ arc(X,Y). path(X,Z) ¬ arc(X,Y), path(Y, Z). Transitive Closure of the graph: arc(X, Y)   path(X,Z) ¬ path(X,Y), arc(Y, Z). Transitive Closure of the graph: arc(X, Y) path(X,Z) ¬ path(X,Y), path(Y, Z).  

Relational Tables for a BoM application PART_COST BASIC_PART SUPPLIER COST TIME top_tube cinelli 20.00 14 columbus 15.00 6 down_tube 10.00 head_tube seat_mast seat_stay chain_stay fork 40.00 30.00 spoke campagnolo 0.60 15 nipple mavic 0.10 3 hub 31.00 5 suntour 18.00 rim 50.00 araya 7.00 1 ASSEMBLY PART SUBPART QTY bike frame 1 wheel 2 top_tube down_tube head_tube seat_mast seat_stay chain_stay fork spoke 36 nipple rim hub tire

Subparts All subparts: a transitive-closure query all_subparts(Part, Sub) ¬ assembly(Part, Sub, _). all_subparts(Part, Sub2) ¬ all_subparts(Part, Sub1), assembly(Sub1, Sub2, _ ). For each part, basic or otherwise, find its basic subparts. A basic part is a subpart of itself : basic_subparts(BasicP, BasicP) ¬ part_cost(BasicP,_ , _, _). basic_subparts(Prt, BasicP) ¬ assembly(Prt, SubP, _), basic_subparts(SubP, BasicP).

Negation and Recursion For each basic part find the least time needed for delivery   fastest(Part, Time) ¬ part_cost(Part, Sup1,_,Time),   Øfaster(Part, Time). faster(Part, Time) ¬ part_cost(Part, _, _, Time), part_cost(Part,_,_,Time1), Time1<Time.    

Negation and Rercusion (cont.) Times required for basic subparts of the given assembly timeForbasic(AssPart, BasicSub, Time) ¬ basic_subparts(AssPart,BasicSub),   fastest(BasicSub, Time).   The maximum time required for basic subparts of the given assembly howsoon(AssPart, Time) ¬ timeForbasic(AssPart, _, Time), Ølarger(AssPart, Time). larger(Part, Time) ¬ timeForbasic(Part,_, Time), timeForbasic(Part,_, Time1), Time1 > Time. Note: to compute howsoon you must first compute larger completely.

Predicate Dependency Graph The Predicate Dependency Graph for a program P is a graph having as nodes the names of the predicates in P. The graph contains an arc a ® b if there exists a rule with goal name a and head-name b. If the goal is negated then the arc is marked as a negative arc. The nodes and arcs of the strong components of pdg(P), respectively, identify the recursive predicates and recursive rules of P. A program is said to be stratifiable when none of its negative arcs belongs to a strong component.o Programs which are stratifiable, have a clear meaning; non-stratifiable programs are often ill-defined from a semantic viewpoint.

PDG for howsoon   howsoon larger timeForbasic fastest basic_subpart faster assembly part_cost

Stratification By sorting on pdg(P), the nodes of P can partitioned into a finite set of n strata 1, ... , n, such that, for each rule r Î P, the predicate-name, of the head of r belongs to a stratum that is ³ to each stratum containing some positive goal, and also is strictly > than each stratum containing some negated goal. A stratification of a program will be called strict every stratum either contains a single predicate or a set of predicates that are mutually recursive.

One-at-the-Time Computations: needed for aggregates Set aggregates, such as count or sum, in SQL, require that the element of a set be visited one-at-the-time. (These aggregates also require arithmetic predicates, that we will consider later.) Counting the elements in a set modulo an integer does not require arithmetic, but still requires the elements of the set be visited one-at-the-time. The parity query: how many tuples in the base relation br(X)–an even number of an odd number?

The parity query: how many tuples in the base relation br between(X, Z) ¬ br(X), br(Y), br(Z), X<Y, Y <Z. next(X, Y) ¬ br(X), br(Y), X < Y Øbetween(X,Y). next(nil, X) ¬ br(X), Øsmaller(X). smaller(X) ¬ br(X), br(Y), Y < X. even(nil). even(Y) ¬ odd(X), next(X, Y). odd(Y) ¬ even(X), next(X, Y). br_is_even ¬ even(X), Ønext(X,Y). next sorts the elements of br into an ascending chain, where the first link of the chain connects the distinguished node nil to the least element in br (third rule in the example). This works only if br is totally ordered.

Expressive Power Data Complexity: query languages are viewed as mappings from the DB to the answer. The big O is evaluated in terms of the size of the database, which is always finite. DB-PTIME: Polynomial Data Complexity w.r.t. DB size 1.   Use Turing machines as the general model of computation and encode the database as a tape of length n 2.   Then any computable function on the database can be encoded as a Turing machine 3.   some of these machines halt (complete their computation), in O(n) steps, others in an an exponential number of steps, others never terminate. 4.   The set machines that halt in a number of steps which is polynomial in n defines the class of DB-PTIME functions. Number of tuples, data-items, bytes: what DB size are we talking about?

Polynomial Data Complexity Are relational algebra expressions evaluable in DB-PTIME? Yes, and actually we use indices and query optimizers to keep exponents and coefficient small. But these languages cannot express DB-PTIME. For instance they cannot express transitive closures, or aggregates (thus the most frequently used aggregates were added to SQL in ad hoc fashion).

The Expressive Power Hierarchy Safe, stratified Datalog programs Can still be computed in polynomial time expresses every DB-PTIME query; thus They DB-PTIME complete. But this is only true if we assume that there exists a total order in the databases Desiderata: Order-independence property of queries (genericity); I.e., queries insensitive to constant renaming. To express all DB-PTIME queries under genericity, a non-deterministic construct such as choice is needed (subject covered in ADS book, but not in CS240A) DB-PTIME completeness is well below Turing completeness—for that you need and infinite universe.

Functors and Complex Terms Flat parts, their number, shape and weight by: part( Part#,  Shape ,  Weight)   part(202, circle(11), actualkg(0.034)). part(121, rectangle(10, 20), unitkg(2.1)).   part_weight(No,Kilos)¬ part(No,_, actualkg(Kilos)). part_weight(No, Kilos)¬ part(No, Shape,unitkg(K)), area(Shape, Area), Kilos= K * Area.   area(circle(Dmtr), A) ¬ A = Dmtr * Dmtr * 3.14/4. area(rectangle(Base, Height), A) ¬ A = Base*Height. The complex terms circle(11), actualkg(34), rectangle(10, 20), and unitkg(2.1) are in logical parlance called functions (A functor followed by a list of arguments in parentheses).

Functors (cont.) In actual applications, these complex terms do not represent functions to be evaluated; they are instead used as variable length sub-records. Thus, circle(11) and rectangle(10, 20), respectively, denote that the shape of our first part is a circle with diameter 20 cm, while the shape of the second part is a rectangle with base 10 cm and height 20 cm. Any number of sub-arguments is allowed in such complex terms, recursively. Objects of arbitrary complexity, including solid objects, can be nested and represented in this fashion. Functors are then used as case discriminants.

Lists [] is the empty list. [Head| Tail] represents a non-empty list. Example: [mary, mike, seattle] Is a shorthand for: [mary,[mike, [seattle,[ ]]]] A list-based representation for suppliers of top_tube: part_sup_list(top_tube,[cinelli,columbus,mavic]). Lists are only syntactic sugaring for a particular function symbol.

Normalizing a nested relation into a flat relation flatten(P, S, L) ¬ part_sup_list(P, [S | L]). flatten(P, S, L) ¬ flatten(P, _, [S | L]). ps(Part, Sup) ¬ flatten(Part, Sup, _). This program applied to the previous fact yields. ps(top_tube, cinelli) ps(top_tube, columbus) ps(top_tube, mavic)

How to Reconstruct the Nested Relation between(P, X, Z) ¬ ps(P, X), ps(P, Y), ps(P, Z), X < Y, Y < Z. smaller(P, X) ¬ ps(P, X), ps(P, Y), Y < X. nested(P, [X]) ¬ ps(P, X), Øsmaller(P, X). nested(P, [Y|[X|W]]) ¬ nested(P, [X|W]), ps(P, Y), X < Y,   Øbetween(P, X,Y). ps_nested(P, W) ¬ nested(P, W), Ønested(P, [X|W]).

Conclusion Recursion and stratified negation assure DB-PTIME completeness for Datalog Practical systems such as LogicBlock and DeAL also support function symbols, arithmetic expressions, and many other constructs, which make the language Turing-complete.