Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Querying Infinite Databases Safety of Datalog Queries over infinite Databases (Sagiv and Vardi ’90) Queries and Computation on the Web (Abiteboul and.

Similar presentations


Presentation on theme: "1 Querying Infinite Databases Safety of Datalog Queries over infinite Databases (Sagiv and Vardi ’90) Queries and Computation on the Web (Abiteboul and."— Presentation transcript:

1 1 Querying Infinite Databases Safety of Datalog Queries over infinite Databases (Sagiv and Vardi ’90) Queries and Computation on the Web (Abiteboul and Vianu ’97) Itay Maman 049011 Student Symposium, 5 July 2006

2 2/19 Simple Technion Queries… (Domain: The Technion’s students database) Q1: Which courses did Gidi attend?  SELECT course FROM students WHERE name='Gidi' Q2: Which students took 234218?  SELECT name FROM students WHERE course='234218' courses coursename 234218Gidi 236703Gidi 234218Dina ……

3 3/19 Simple Web Queries… Q3: Which pages does my home page link to?  SELECT target FROM links WHERE source='www.geocities.com/mysite' Q4: Which pages link to my home page?  SELECT source FROM links WHERE target='www.geocities.com/mysite' Q4 is challenging:  No matter how long my web-crawler works…  … I can never find all incoming links of a page!  This is an infinite query The more you crawl the more answers you get  (In Q3 the size of the result set is bounded) links Sourcetarget www.google.com www.google.co.il www.geocities.com/mysite www.ynet.co.il www.cnn.com www.geocities.com/mysite … …

4 4/19 Leading questions What does an infinite DB look like? Can we evaluate a query over an infinite DB? Can we determine the finiteness of a query? But first, some Datalog…

5 5/19 Datalog Why Datalog?  Supports recursion/transitive closure (unlike SQL) Recursion is essential in large data-sets  Terminates if DB is finite  Very simple program = A collection of rules rule = A sequence of terms In our program:  Three rules  Two queries (AKA: IDB): g(X), small(X,Y)  One Table (AKA: EDB): before(X,Y)  A goal predicate from which execution starts We choose g(X) as the goal g(X) :- small(X,2). small(X,Y) :- before(X,Y). small(X,Y) :- small(X,Z), before(Z,Y). g(X) :- small(X,2). small(X,Y) :- before(X,Y). small(X,Y) :- small(X,Z), before(Z,Y).

6 6/19 Finiteness A DB is finite If every table is a finite set  before(X,Y)  { (0,1), (1,2), (2,3) } Possible evaluation schemes:  Brute force  Bottom up Optimizations The Requirement: Finiteness of tables The guarantee: Termination of the Datalog program

7 7/19 Infinity Here is another definition for our table  before(X,Y)  { (X,X+1) | X  0 } We now have an infinite DB  The Problem: we cannot iterate over the tuples in the set  The solution: Top-down algorithm Such tables are quite common  The internet links relation links(X,Y)  { (X,Y) | page X links to page Y }  Java’s subclassing relation extends(X,Y)  { (X,Y) | class X extends Y } Leading question: What does as infinite DB look like?

8 8/19 Example: Top-down evaluation g(W) = s(W,2) = b(W,2)  s(W,Z)  b(Z,2) = {(1,2)}  s(W,1)  {(1,2)} = {(1,2)}  [b(W,1)  s(W,Z)  b(Z,1)]  {(1,2)} = {(1,2)}  [{(0,1)}  s(W,0)  {(0,1)}]  {(1,2)} = {(1,2)}  [{(0,1)}  [b(W,0)  s(W,Z)  b(Z,0)]  {(0,1)}]  {(1,2)} = {(1,2)}  [{(0,1)}  [   s(W,Z)   ]  {(0,1)}]  {(1,2)} = {(1,2)}  [{(0,1)}    {(0,1)}]  {(1,2)} = {(1,2)}  {(0,1)}  {(1,2)} = {(1,2)}  {(0,2)} = {(1,2), (0,2)} g(W) :- small(W,2). small(A,B) :- before(A,B). small(X,Y) :- small(X,Z), before(Z,Y). before(X,Y)  { (X,X+1) | X  0 } g(W) :- small(W,2). small(A,B) :- before(A,B). small(X,Y) :- small(X,Z), before(Z,Y). before(X,Y)  { (X,X+1) | X  0 } b : before s : small  : Join s(X,Y) = b(X,Y)  s(X,Z)  b(Z,Y)

9 9/19 Top-down evaluation The Top-down algorithm  Init: assign r  body of the goal  Loop: (Intelligently) Pick a term, t, from r If t is a query term:  Replace it with the union of the rules indicated by t If t is a table term:  Replace it with the set generated by the table Replace s   expressions (in r) with  Replace s   expressions (in r) with s Evaluate relational algebra expressions (if both sides are known)  Stop if no further replacements can be made Leading question: Can we evaluate a query over an infinite DB? Yes

10 10/19 Infinite Queries Can the top-down algorithm run forever?  Yes Case 1: An table that returns an infinite result  evenProduct(X,Y)  { (X,Y) | X*Y mod 2 = 0 }  divides(X,Y)  { (X,Y) | X mod Y = 0 }  links(X,Y)  { (X,Y) | page X links to page Y } weak-safety: all intermediate results are finite Result #1 (Sagiv and Vardi ’90):  Weak-safety is decidable given F/C (finiteness constraints) of tables F/C of evenProduct: None F/C of divides: X => Y F/C of links: X => Y  Algorithm: Tracking flow of values from assigned variables

11 11/19 g(W) = s(2,W) = b(2,W)  s(2,Z)  b(Z,W) = {(2,3)}  s(2,Z)  b(Z,W) = {(2,3)}  [b(2,Z)  s(2,Z’)  b(Z’,Z)]  b(Z,W) … Infinite Queries (cont.) Can the top-down algorithm run forever?  Yes Case 2: The algorithm’s recursion never stops  A query/table is used in its “unbounded” direction g(W) :- small(2,W). small(A,B) :- before(A,B). small(X,Y) :- small(X,Z), before(Z,Y). before(X,Y)  { (X,X+1) | X  0 } g(W) :- small(2,W). small(A,B) :- before(A,B). small(X,Y) :- small(X,Z), before(Z,Y). before(X,Y)  { (X,X+1) | X  0 } s(X,Y) = b(X,Y)  s(X,Z)  b(Z,Y) Results #2-3 (Sagiv and Vardi ’90):  Termination is undecidable in the general case  Termination is decideable if all queries are unary

12 12/19 Infinite Queries (summ.) We can automatically determine weak-safety We cannot (automatically) determine termination But, one can analytically prove that a given query over a given DB is finite  E.g., our small(W,2) program Leading question: Can we determine the finiteness of a query? No

13 13/19 The Web as a DB The web data model (WDM):  A scheme of a DB that can represent the web graph  Just three tables: urls = { u | u is a url of a web-page } links = { (u1,u2) | u1 links to u2; u1, u2  urls } Words = { (u,w) | w appears in page u; u  urls } Result #4 (Abiteboul and Vianu ’97):  If a Datalog program with no literals halts over an infinite DB, its result is  => A non-trivial query (over an infinite DB) must have a literal

14 14/19 Web - Machines Browsing Machine  A weakly safe Datalog program (over WDM)  At least one URL literal Searching/Browsing Machine  An unsafe Datalog program (over WDM) Evaluates queries in parallel  Allowed literal types: URLs, Words Claims #1-2 ( Abiteboul and Vianu ’97):  Browsing machine: Represent a user following static links from a page  Searching/Browsing machine: Also allows the user to access search engine

15 15/19 Discussion: Finite approximation Relational Database servers are very popular  Such DBs are finite Also, computing a table on demand may be slow  Better performance at batch processing The challenge: Build a finite replacement for an infinite DB Formally:  Given a finite query, q, over an infinite DB,  (Finiteness of q proved analytically)  Build a finite Database, , such that q over  yield the same result as q over 

16 16/19 Discussion: Finite approximation Example: Our small(W,2) program  A finite, sound table: before(X,Y)  { (0,1), (1,2) }  A finite, unsound table: before(X,Y)  { (0,1) } The process:  Compute the transitive closure of the before relation  Start from the literal ‘2’ at the right-hand side position Condition: the table graph must end with a sink  In before the sink is the vertex ‘0’ => We can build a finite DB  Sadly, In the web-graph no such sink exists

17 17/19 Discussion: Temporality Crawling takes time The subject may change while crawling  The DB is a snapshot which never happened (Open Question): Can we decide whether a result was really “true” at some point?

18 18/19 More issues Relational algebra over large relations  BDD Negation  Stratified Datalog

19 19/19 - Questions ? -

20 20/19

21 21/19 Datalog Semantics: ??? Straight forward mapping to Relational Algebra?? g(X) :- small(X,2). small(X,Y) :- before(X,Y). small(X,Y) :- small(X,Z), before(Z,Y). g(X) :- small(X,2). small(X,Y) :- before(X,Y). small(X,Y) :- small(X,Z), before(Z,Y).

22 22/19 Example: Bottom-up evaluation before XY 01 12 23 Initialization: Translate the EDBs into relations

23 23/19 Example: Bottom-up evaluation small XY 01 12 23 apply small(X,Y) :- before(X,Y). before XY 01 12 23

24 24/19 Example: Bottom-up evaluation before ZY 01 12 23 apply small(X,Y) :- small(X,Z), before(Z,Y). less XZ 01 12 23 small XY 01 12 23 02 13 Join small XZ 01 12 23 before ZY 01 12 23 small XZ 01 12 23 02 13 XZ 01 12 23 02 13 XZ 01 12 23 02 13 03

25 25/19 Example: Bottom-up evaluation apply g(X) :- small(X,2). small XY 01 12 23 02 13 03 g X 1 0 XY 01 12 23 02 13 03

26 26/19 Finiteness before(X,Y)  { (0,1) (1,2) (2,3) } The Bottom-up algorithm:  Init: For each EDB, p, assign r(p)  Relation of all tuples satisfying p For each IDB, p, assign r(p)    Loop: Choose a rule p(…) :- t1(…), t2(…), … tn(…) t  join of all r(t i ), where 1  i  n r(p)  r(p)  t  Continue until a fix-point is reached Requires: Finiteness of EDBs Ensures: Termination


Download ppt "1 Querying Infinite Databases Safety of Datalog Queries over infinite Databases (Sagiv and Vardi ’90) Queries and Computation on the Web (Abiteboul and."

Similar presentations


Ads by Google