CS240A: Databases and Knowledge Bases Recursion and Stratification Carlo Zaniolo Department of Computer Science University of California, Los Angeles December,

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

Part VI NP-Hardness. Lecture 23 Whats NP? Hard Problems.
Relational Calculus and Datalog
Lecture 24 MAS 714 Hartmut Klauck
CS240A: Databases and Knowledge Bases From Deductive Rules to Active Rules Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Ver 1,12/09/2012Kode :CCs 111,Sistem basis DataFASILKOM Chapter 5: Other Relational Languages Database System Concepts, 5th Ed. ©Silberschatz, Korth and.
The Big Picture Chapter 3. We want to examine a given computational problem and see how difficult it is. Then we need to compare problems Problems appear.
The Theory of NP-Completeness
1 Recursive SQL, Deductive Databases, Query Evaluation Book Chapter of Ramankrishnan and Gehrke DBMS Systems, 3 rd ed.
1 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Deductive Databases Chapter 25.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Complexity 11-1 Complexity Andrei Bulatov Space Complexity.
CS240A: Databases and Knowledge Bases Introduction Carlo Zaniolo Department of Computer Science University of California, Los Angeles WINTER 2002.
CS240A: Databases and Knowledge Bases Time Ontology and Representations Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
CS240A: Databases and Knowledge Bases Carlo Zaniolo Department of Computer Science University of California, Los Angeles December, 2001 Notes From Textbook.
Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.
CS240A: Databases and Knowledge Bases Fixpoint Semantics of Datalog Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
CS 240A: Databases and Knowledge Bases Analysis of Active Databases Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
Mid-term Class Review.
CS240A: Databases and Knowledge Bases Time Ontology and Representations Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Cs5611 Recursive SQL, Deductive Databases, Query Evaluation Slides based on book chapter, By Ramankrishnan and Gehrke DBMS Systems, 3 rd ed.
Deductive Databases Chapter 25
Abstract State Machines and Computationally Complete Query Languages Andreas Blass,U Michigan Yuri Gurevich,Microsoft Research & U Michigan Jan Van den.
Sets and Expressions Number Sets
Section 11.4 Language Classes Based On Randomization
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Randomized Turing Machines
The Relational Model: Relational Calculus
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
CS 103 Discrete Structures Lecture 10 Basic Structures: Sets (1)
Chapter 1, Part II: Predicate Logic With Question/Answer Animations.
CS240A: Databases and Knowledge Bases Recursive Queries in SQL 1999 Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
MA/CSSE 474 Theory of Computation Decision Problems DFSMs.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Relations, Functions, and Matrices Mathematical Structures for Computer Science Chapter 4 Copyright © 2006 W.H. Freeman & Co.MSCS Slides Relations, Functions.
ICS 253: Discrete Structures I Induction and Recursion King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Mathematical Preliminaries
1 Turing’s Thesis. 2 Turing’s thesis: Any computation carried out by mechanical means can be performed by a Turing Machine (1930)
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
1 First order theories (Chapter 1, Sections 1.4 – 1.5) From the slides for the book “Decision procedures” by D.Kroening and O.Strichman.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
CS 240A: Databases and Knowledge Bases Analysis of Active Databases Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
CompSci 102 Discrete Math for Computer Science March 13, 2012 Prof. Rodger Slides modified from Rosen.
CS240A: Databases and Knowledge Bases Recursive Queries in SQL 2003 Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Copyright © Curt Hill Other Trees Applications of the Tree Structure.
CS240A: Databases and Knowledge Bases TSQL2 Carlo Zaniolo Department of Computer Science University of California, Los Angeles Notes From Chapter 6 of.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
CS623: Introduction to Computing with Neural Nets (lecture-7) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
1 Part 5. Permission Rules for Two-Level Systems Controlling access (visibility or scope) of static references. Analogous to “private” in C/C++/Java.
1 Chapter 9 Undecidability  Turing Machines Coded as Binary Strings  Universal Turing machine  Diagonalizing over Turing Machines  Problems as Languages.
1 The Theory of NP-Completeness 2 Review: Finding lower bound by problem transformation Problem X reduces to problem Y (X  Y ) iff X can be solved by.
1 8.4 Extensions to the Basic TM Extended TM’s to be studied: Multitape Turing machine Nondeterministic Turing machine The above extensions make no increase.
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Extensions of Datalog Wednesday, February 13, 2001.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
MA/CSSE 474 Theory of Computation Decision Problems, Continued DFSMs.
CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides.
CS240A: Databases and Knowledge Bases Recursion and Stratification
PROPERTIES OF REGULAR LANGUAGES
Modeling Arithmetic, Computation, and Languages
Semantics of Datalog With Negation
Logic Based Query Languages
CS240A: Databases and Knowledge Bases TSQL2
Presentation transcript:

CS240A: Databases and Knowledge Bases Recursion and Stratification Carlo Zaniolo Department of Computer Science University of California, Los Angeles December, Notes From Chapter 8 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari. Morgan Kaufmann, 1997

Transitive Closure Queries  Transitive closure of the graph: arc(X, Y) path(X,Y)  arc(X,Y). path(X,Z)  arc(X,Y), path(Y, Z).  Transitive Closure of the graph: arc(X, Y) path(X,Y)  arc(X,Y). path(X,Z)  path(X,Y), arc(Y, Z).  Transitive Closure of the graph: arc(X, Y) path(X,Y)  arc(X,Y). path(X,Z)  path(X,Y), path(Y, Z).

Relational Tables for a BoM application PART_COST BASIC_PARTSUPPLIERCOSTTIME top_tubecinelli top_tubecolumbus down_tubecolumbus head_tubecinelli head_tubecolumbus seat_mastcinelli seat_mastcinelli seat_staycinelli seat_staycolumbus chain_staycolumbus forkcinelli forkcolumbus spokecampagnolo nipplemavic0.103 hubcampagnolo hubsuntour rimmavic rimaraya7.001 ASSEMBLY PARTSUBPARTQTY bikeframe1 bikewheel2 frametop_tube1 framedown_tube1 framehead_tube1 frameseat_mast1 frameseat_stay2 framechain_stay2 framefork1 wheelspoke36 wheelnipple36 wheelrim1 wheelhub1 wheeltire1

Subparts  All subparts: a transitive-closure query all_subparts(Part, Sub)  assembly(Part, Sub, _). all_subparts(Part, Sub2)  all_subparts(Part, Sub1), assembly(Sub1, Sub2, _ ).  For each part, basic or otherwise, find its basic subparts. A basic part is a subpart of itself : basic_subparts(BasicP, BasicP)  part_cost(BasicP,_, _, _). basic_subparts(Prt, BasicP)  assembly(Prt, SubP, _), basic_subparts(SubP, BasicP).

Negation and Recursion  For each basic part find the least time needed for delivery fastest(Part, Time)  part_cost(Part, Sup1,Cost,Time),  faster(Part, Time). faster(Part, Time)  part_cost(Part, Sup2, Cost, Time), part_cost(Part,Sup1,Cost,Time1), Time1<Time.

Negation and Rercusion (cont.)  Times required for basic subparts of the given assembly timeForbasic(AssPart, BasicSub, Time)  basic_subparts(AssPart,BasicSub), fastest(BasicSub, Time).  The maximum time required for basic subparts of the given assembly howsoon(AssPart, Time)  timeForbasic(AssPart, _, Time),  larger(AssPart, Time). larger(Part, Time)  timeForbasic(Part, _, Time), timeForbasic(Part, _, Time1), Time1 > Time. Note: to compute howsoon you must first compute larger completely.

Predicate Dependency Graph  The Predicate Dependency Graph for a program P is a graph having as nodes the names of the predicates in P. The graph contains an arc a  b if there exists a rule with goal name a and head-name b. If the goal is negated then the arc is marked as a negative arc.  The nodes and arcs of the strong components of pdg(P), respectively, identify the recursive predicates and recursive rules of P.  A program is said to be stratifiable when none of its negative arcs belongs to a strong component.o  Programs which are stratifiable, have a clear meaning; non-stratifiable programs are often ill-defined from a semantic viewpoint.

PCG for howsoon howsoon larger timeForbasic assembly basic_subpartfaster part_cost fastest  

Stratification  By sorting on pdg(P), the nodes of P can partitioned into a finite set of n strata 1,..., n, such that, for each rule r  P, the predicate-name, of the head of r belongs to a stratum that  is  to each stratum containing some positive goal, and also  is strictly > than each stratum containing some negated goal.  A stratification of a program will be called strict every stratum either contains a single predicate or a set of predicates that are mutually recursive.

One-at-the-Time Computations: needed for aggregates  Set aggregates, such as count or sum, in SQL, require that the element of a set be visited one-at-the-time. (These aggregates also require arithmetic predicates, that we will consider later.)  Counting the elements in a set modulo an integer does not require arithmetic, but still requires the elements of the set be visited one-at-the-time.  The parity query: how many tuples in the base relation br(X) – an even number of an odd number?

The parity query: how many tuples in the base relation br between(X, Z)  br(X), br(Y), br(Z), X<Y, Y <Z. next(X, Y)  br(X), br(Y), X < Y  between(X,Y). next(nil, X)  br(X),  smaller(X). smaller(X)  br(X), br(Y), Y < X. even(nil). even(Y)  odd(X), next(X, Y). odd(Y)  even(X), next(X, Y). br_is_even  even(X),  next(X,Y). next sorts the elements of br into an ascending chain, where the first link of the chain connects the distinguished node nil to the least element in br (third rule in the example). This works only if br is totally ordered.

Expressive Power Data Complexity: query languages are viewed as mappings from the DB to the answer. The big O is evaluated in terms of the size of the database, which is always finite.  DB-PTIME: Polynomial Data Complexity w.r.t. DB size  1. Use Turing machines as the general model of computation and encode the database as a tape of length n  2. Then any computable function on the database can be encoded as a Turing machine  3. some of these machines halt (complete their computation), in O(n) steps, others in an an exponential number of steps, others never terminate.  4. The set machines that halt in a number of steps which is polynomial in n defines the class of DB-PTIME functions. Number of tuples, data-items, bytes: what DB size are we talking about?

Polynomial Data Complexity  Are relational algebra expressions evaluable in DB- PTIME?  Yes, and actually we use indices and query optimizers to keep exponents and coefficient small.  But these languages cannot express DB-PTIME. For instance they cannot express transitive closures, or aggregates (thus the most frequently used aggregates were added to SQL in ad hoc fashion).

The Expressive Power Hierarchy  Safe, stratified Datalog programs  Can still be computed in polynomial time  expresses every DB-PTIME query; thus  They DB-PTIME complete.  But this is only true if we assume that there exists a total order in the databases  Desiderata: Order-independence property of queries (genericity); I.e., queries insensitive to constant renaming.  To express all DB-PTIME queries under genericity, a non-deterministic construct such as choice is needed (subject covered in ADS book, but not in CS240A)  DB-PTIME completeness is well below Turing completeness—for that you need and infinite universe.

Functors and Complex Terms  Flat parts, their number, shape and weight, following the schema: part( Part#, Shape, Weight) part(202, circle(11), actualkg(0.034)). part(121, rectangle(10, 20), unitkg(2.1)). part_weight(No, Kilos )  part(No,, actualkg(Kilos)). part_weight(No, Kilos )  part(No, Shape, unitkg(K)), area(Shape, Area), Kilos = K * Area. area(circle(Dmtr), A)  A = Dmtr * Dmtr * 3.14/4. area(rectangle(Base, Height), A)  A = Base*Height.  The complex terms circle(11), actualkg(34), rectangle(10, 20), and unitkg(2.1) are in logical parlance called functions (A functor followed by a list of arguments in parentheses).

Functors (cont.)  In actual applications, these complex terms do not represent functions to be evaluated; they are instead used as variable length sub-records.  Thus, circle(11) and rectangle(10, 20), respectively, denote that the shape of our first part is a circle with diameter 20 cm, while the shape of the second part is a rectangle with base 10 cm and height 20 cm. Any number of sub-arguments is allowed in such complex terms, recursively.  Objects of arbitrary complexity, including solid objects, can be nested and represented in this fashion. Functors are then used as case discriminants.

Lists [] is the empty list. [Head  Tail] represents a non-empty list. Example : [mary, mike, seattle] Is a shorthand for: [mary,[mike, [seattle,[ ]]]]  A list-based representation for suppliers of top_tube: part_sup_list(top_tube,[cinelli,columbus,mavic]).  Lists are only syntactic sugaring for a particular function symbol.

Normalizing a nested relation into a flat relation flatten(P, S, L)  part_sup_list(P, [S  L]). flatten(P, S, L)  flatten(P, _, [S  L]). ps(Part, Sup)  flatten(Part, Sup, _).  This program applied to the previous fact yields. ps(top_tube, cinelli) ps(top_tube, columbus) ps(top_tube, mavic)

How to Reconstruct the Nested Relation between(P, X, Z)  ps(P, X), ps(P, Y), ps(P, Z), X < Y, Y < Z. smaller(P, X)  ps(P, X), ps(P, Y), Y < X. nested(P, [X])  ps(P, X),  smaller(P, X). nested(P, [Y  [X  W]])  nested(P, [X  W]), ps(P, Y), X < Y,  between(P, X,Y). ps_nested(P, W)  nested(P, W),  nested(P, [X  W]).

Conclusion  Recursion and stratified negation assure DB- PTIME completeness for Datalog  Practical systems such as LDL++ also support function symbols, arithmetic expressions, and many other constructs.