1 Querying Infinite Databases Safety of Datalog Queries over infinite Databases (Sagiv and Vardi ’90) Queries and Computation on the Web (Abiteboul and.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

Relational Calculus and Datalog
Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Markov Decision Process
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Siddharth Srivastava, Neil Immerman, Shlomo Zilberstein University of Massachusetts Amherst.
Max Cut Problem Daniel Natapov.
Timed Automata.
1 541: Relational Calculus. 2 Relational Calculus  Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC).  Calculus.
1 Temporal Claims A temporal claim is defined in Promela by the syntax: never { … body … } never is a keyword, like proctype. The body is the same as for.
Efficient Query Evaluation on Probabilistic Databases
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Axiomatic Semantics.
ISBN Chapter 3 Describing Syntax and Semantics.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
1 9. Evaluation of Queries Query evaluation – Quantifier Elimination and Satisfiability Example: Logical Level: r   y 1,…y n  r’ Constraint.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Introduction to Computers and Programming Lecture 4: Mathematical Operators New York University.
Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Methods of Proof Chapter 7, second half.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Embedded SQL Direct SQL is rarely used: usually, SQL is embedded in some application code. We need some method to reference SQL statements. But: there.
Describing Syntax and Semantics
Mid-term Class Review.
Data Flow Analysis Compiler Design Nov. 8, 2005.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
CS 267: Automated Verification Lecture 13: Bounded Model Checking Instructor: Tevfik Bultan.
Discussion #24 1/17 Discussion #24 Deductive Databases.
Relational Algebra Wrap-up and Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 11, 2003.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
Deductive Databases Chapter 25
Rutgers University Relational Calculus 198:541 Rutgers University.
Abstract State Machines and Computationally Complete Query Languages Andreas Blass,U Michigan Yuri Gurevich,Microsoft Research & U Michigan Jan Van den.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti.
The Relational Model: Relational Calculus
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Section 4.3.
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Computing & Information Sciences Kansas State University Thursday, 08 Feb 2007CIS 560: Database System Concepts Lecture 11 of 42 Thursday, 08 February.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
1 Relational Algebra and Calculas Chapter 4, Part A.
MA/CSSE 474 Theory of Computation Decision Problems DFSMs.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
Computing & Information Sciences Kansas State University Wednesday, 17 Sep 2008CIS 560: Database System Concepts Lecture 9 of 42 Wednesday, 18 September.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
SAT-Based Model Checking Without Unrolling Aaron R. Bradley.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.
Algorithmic Problems in Algebraic Structures Undecidability Paul Bell Supervisor: Dr. Igor Potapov Department of Computer Science
Extensions of Datalog Wednesday, February 13, 2001.
Relational Calculus Chapter 4, Section 4.3.
CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides.
Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman
Relational Calculus Chapter 4, Part B
Semantics of Datalog With Negation
How Hard Can It Be?.
Motivation for Datalog
Lecture 10: Query Complexity
Logic Based Query Languages
Datalog Inspired by the impedance mismatch in relational databases.
CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre
Presentation transcript:

1 Querying Infinite Databases Safety of Datalog Queries over infinite Databases (Sagiv and Vardi ’90) Queries and Computation on the Web (Abiteboul and Vianu ’97) Itay Maman Student Symposium, 5 July 2006

2/19 Simple Technion Queries… (Domain: The Technion’s students database) Q1: Which courses did Gidi attend?  SELECT course FROM students WHERE name='Gidi' Q2: Which students took ?  SELECT name FROM students WHERE course='234218' courses coursename Gidi Gidi Dina ……

3/19 Simple Web Queries… Q3: Which pages does my home page link to?  SELECT target FROM links WHERE source=' Q4: Which pages link to my home page?  SELECT source FROM links WHERE target=' Q4 is challenging:  No matter how long my web-crawler works…  … I can never find all incoming links of a page!  This is an infinite query The more you crawl the more answers you get  (In Q3 the size of the result set is bounded) links Sourcetarget … …

4/19 Leading questions What does an infinite DB look like? Can we evaluate a query over an infinite DB? Can we determine the finiteness of a query? But first, some Datalog…

5/19 Datalog Why Datalog?  Supports recursion/transitive closure (unlike SQL) Recursion is essential in large data-sets  Terminates if DB is finite  Very simple program = A collection of rules rule = A sequence of terms In our program:  Three rules  Two queries (AKA: IDB): g(X), small(X,Y)  One Table (AKA: EDB): before(X,Y)  A goal predicate from which execution starts We choose g(X) as the goal g(X) :- small(X,2). small(X,Y) :- before(X,Y). small(X,Y) :- small(X,Z), before(Z,Y). g(X) :- small(X,2). small(X,Y) :- before(X,Y). small(X,Y) :- small(X,Z), before(Z,Y).

6/19 Finiteness A DB is finite If every table is a finite set  before(X,Y)  { (0,1), (1,2), (2,3) } Possible evaluation schemes:  Brute force  Bottom up Optimizations The Requirement: Finiteness of tables The guarantee: Termination of the Datalog program

7/19 Infinity Here is another definition for our table  before(X,Y)  { (X,X+1) | X  0 } We now have an infinite DB  The Problem: we cannot iterate over the tuples in the set  The solution: Top-down algorithm Such tables are quite common  The internet links relation links(X,Y)  { (X,Y) | page X links to page Y }  Java’s subclassing relation extends(X,Y)  { (X,Y) | class X extends Y } Leading question: What does as infinite DB look like?

8/19 Example: Top-down evaluation g(W) = s(W,2) = b(W,2)  s(W,Z)  b(Z,2) = {(1,2)}  s(W,1)  {(1,2)} = {(1,2)}  [b(W,1)  s(W,Z)  b(Z,1)]  {(1,2)} = {(1,2)}  [{(0,1)}  s(W,0)  {(0,1)}]  {(1,2)} = {(1,2)}  [{(0,1)}  [b(W,0)  s(W,Z)  b(Z,0)]  {(0,1)}]  {(1,2)} = {(1,2)}  [{(0,1)}  [   s(W,Z)   ]  {(0,1)}]  {(1,2)} = {(1,2)}  [{(0,1)}    {(0,1)}]  {(1,2)} = {(1,2)}  {(0,1)}  {(1,2)} = {(1,2)}  {(0,2)} = {(1,2), (0,2)} g(W) :- small(W,2). small(A,B) :- before(A,B). small(X,Y) :- small(X,Z), before(Z,Y). before(X,Y)  { (X,X+1) | X  0 } g(W) :- small(W,2). small(A,B) :- before(A,B). small(X,Y) :- small(X,Z), before(Z,Y). before(X,Y)  { (X,X+1) | X  0 } b : before s : small  : Join s(X,Y) = b(X,Y)  s(X,Z)  b(Z,Y)

9/19 Top-down evaluation The Top-down algorithm  Init: assign r  body of the goal  Loop: (Intelligently) Pick a term, t, from r If t is a query term:  Replace it with the union of the rules indicated by t If t is a table term:  Replace it with the set generated by the table Replace s   expressions (in r) with  Replace s   expressions (in r) with s Evaluate relational algebra expressions (if both sides are known)  Stop if no further replacements can be made Leading question: Can we evaluate a query over an infinite DB? Yes

10/19 Infinite Queries Can the top-down algorithm run forever?  Yes Case 1: An table that returns an infinite result  evenProduct(X,Y)  { (X,Y) | X*Y mod 2 = 0 }  divides(X,Y)  { (X,Y) | X mod Y = 0 }  links(X,Y)  { (X,Y) | page X links to page Y } weak-safety: all intermediate results are finite Result #1 (Sagiv and Vardi ’90):  Weak-safety is decidable given F/C (finiteness constraints) of tables F/C of evenProduct: None F/C of divides: X => Y F/C of links: X => Y  Algorithm: Tracking flow of values from assigned variables

11/19 g(W) = s(2,W) = b(2,W)  s(2,Z)  b(Z,W) = {(2,3)}  s(2,Z)  b(Z,W) = {(2,3)}  [b(2,Z)  s(2,Z’)  b(Z’,Z)]  b(Z,W) … Infinite Queries (cont.) Can the top-down algorithm run forever?  Yes Case 2: The algorithm’s recursion never stops  A query/table is used in its “unbounded” direction g(W) :- small(2,W). small(A,B) :- before(A,B). small(X,Y) :- small(X,Z), before(Z,Y). before(X,Y)  { (X,X+1) | X  0 } g(W) :- small(2,W). small(A,B) :- before(A,B). small(X,Y) :- small(X,Z), before(Z,Y). before(X,Y)  { (X,X+1) | X  0 } s(X,Y) = b(X,Y)  s(X,Z)  b(Z,Y) Results #2-3 (Sagiv and Vardi ’90):  Termination is undecidable in the general case  Termination is decideable if all queries are unary

12/19 Infinite Queries (summ.) We can automatically determine weak-safety We cannot (automatically) determine termination But, one can analytically prove that a given query over a given DB is finite  E.g., our small(W,2) program Leading question: Can we determine the finiteness of a query? No

13/19 The Web as a DB The web data model (WDM):  A scheme of a DB that can represent the web graph  Just three tables: urls = { u | u is a url of a web-page } links = { (u1,u2) | u1 links to u2; u1, u2  urls } Words = { (u,w) | w appears in page u; u  urls } Result #4 (Abiteboul and Vianu ’97):  If a Datalog program with no literals halts over an infinite DB, its result is  => A non-trivial query (over an infinite DB) must have a literal

14/19 Web - Machines Browsing Machine  A weakly safe Datalog program (over WDM)  At least one URL literal Searching/Browsing Machine  An unsafe Datalog program (over WDM) Evaluates queries in parallel  Allowed literal types: URLs, Words Claims #1-2 ( Abiteboul and Vianu ’97):  Browsing machine: Represent a user following static links from a page  Searching/Browsing machine: Also allows the user to access search engine

15/19 Discussion: Finite approximation Relational Database servers are very popular  Such DBs are finite Also, computing a table on demand may be slow  Better performance at batch processing The challenge: Build a finite replacement for an infinite DB Formally:  Given a finite query, q, over an infinite DB,  (Finiteness of q proved analytically)  Build a finite Database, , such that q over  yield the same result as q over 

16/19 Discussion: Finite approximation Example: Our small(W,2) program  A finite, sound table: before(X,Y)  { (0,1), (1,2) }  A finite, unsound table: before(X,Y)  { (0,1) } The process:  Compute the transitive closure of the before relation  Start from the literal ‘2’ at the right-hand side position Condition: the table graph must end with a sink  In before the sink is the vertex ‘0’ => We can build a finite DB  Sadly, In the web-graph no such sink exists

17/19 Discussion: Temporality Crawling takes time The subject may change while crawling  The DB is a snapshot which never happened (Open Question): Can we decide whether a result was really “true” at some point?

18/19 More issues Relational algebra over large relations  BDD Negation  Stratified Datalog

19/19 - Questions ? -

20/19

21/19 Datalog Semantics: ??? Straight forward mapping to Relational Algebra?? g(X) :- small(X,2). small(X,Y) :- before(X,Y). small(X,Y) :- small(X,Z), before(Z,Y). g(X) :- small(X,2). small(X,Y) :- before(X,Y). small(X,Y) :- small(X,Z), before(Z,Y).

22/19 Example: Bottom-up evaluation before XY Initialization: Translate the EDBs into relations

23/19 Example: Bottom-up evaluation small XY apply small(X,Y) :- before(X,Y). before XY

24/19 Example: Bottom-up evaluation before ZY apply small(X,Y) :- small(X,Z), before(Z,Y). less XZ small XY Join small XZ before ZY small XZ XZ XZ

25/19 Example: Bottom-up evaluation apply g(X) :- small(X,2). small XY g X 1 0 XY

26/19 Finiteness before(X,Y)  { (0,1) (1,2) (2,3) } The Bottom-up algorithm:  Init: For each EDB, p, assign r(p)  Relation of all tuples satisfying p For each IDB, p, assign r(p)    Loop: Choose a rule p(…) :- t1(…), t2(…), … tn(…) t  join of all r(t i ), where 1  i  n r(p)  r(p)  t  Continue until a fix-point is reached Requires: Finiteness of EDBs Ensures: Termination