Datalog Presented by: Michael Lam CS 157A section 1 Dr. Sin Min Lee
Outline Basic Structure/Terminolgoy (5.2.1) Syntax of Datalog Rules (5.2.2) Safety (5.2.4) Recursion in Datalog (5.2.6) Relational Operations in Datalog (5.2.5) Examples
What is Datalog? Datalog is a nonprocedural query language based on the logic- programming language Prolog. A user describes the information desired without giving a specific procedure for obtaining that information. Datalog simplifies writing simple queries and makes query optimization easier.
Terminology Variable A variable is like a variable in a mathematical equation, it represents some constant that has yet to be determined. By convention variables always start with an upper case letter, e.g., City, Firstname,Code, A, B, C, etc Constant A constant is an attribute value. By convention all constants begin with lower-case letters such as: sammy, joe, xxy2003, etc fact A fact is the same things a tuple in a relation. A fact has the following structure. (called Ground facts, because there are no variables) ( ) Example: Facts about student enrollment: The following set of facts shows some students and the subjects in which they are enrolled. enrolledIn(joe, cp2001) enrolledIn(sue, cp2001) enrolledIn(joe, cp2003). I n relational database terms, it would be the same as the following relation. enrolledIn Rules A rule is a way to derive new facts. It is part of a query.
JOE CP2001 SUECP2001 JOECP2003 The major difference in the representation is that in the Datalog facts the attributes are not named. So there is nothing to tell us that the first attribute is a Name and the second is a Code. enrolledIn
Syntax of Datalog Rules A rule has the following form :- Head is a single predicate and the body is a list of predicates. Each predicate has the form positive literal: ( ) negative literal: not ( ) What does :- mean? Assume we have a rule: Q :- P Then :- means, if P is true then Q is true Example: A rule about students The following rule identifies courses students are enrolled in.
student result(Name, Code, Phone) :- student(Name, Phone), enrolledIn(Name, Code). enrolledIn NamePhone joe NameCode joeCP2004 sueCP2001 joeCP2003 Test every combination of student and enrooledIn that unifies student(joe, ), enrolledIn(joe, CP2004) student(joe, ), enrolledIn(sue, CP2001) student(joe, ), enrolledIn(joe, CP2003) Result: result(joe, CP2004, ) result(joe, CP2003, )
Safety It may be possible to write rules that generate an an infinite number of answers. For example: employee(X,Y) :- X>Y Since the relation defining > is infinite, this rule would generate an infinite number of facts for the relation employee. Also the use of negation can cause similar problems. For example: not-in-loan(L, B, A) :- not loan(L, B, A) We what to retrieve all tuples that are not in the loan relation but if account numbers, branch-name and balance is infinite then the relation not-in-loan would also be infinite. To avoid such problems Datalog rules are required to satisfy the following safety conditions: 1. Every variable that appears in the head of the rule also appears in a nonarithmetic positive literal in the body of the rule 2. Every variable appearing in a negative literal in the body of the rule also appears in some positive literal in the body of the rule.
Recursion in Datalog Procedure I = set of facts in the database repeat Old_I = I I = I U infer( R, I) until I = Old_I Suppose now that we want to find out which employees are supervised, directly or indirectly by a given manager, "Jones". (1) People whose manager is "Jones" (2) People whose manager is supervised by "Jones" Ex. Tim manager is Jones Kelly manager is Tim employee-jones(X) :- manager(X, "Jones") employee-jones(X) :- manager(X,Y), employee-jones(Y)
employee-namemanager-name AlonBarinsky Estovar CorbinDuarte Jones EstovarJones Klinger RensalKlinger IterationTuples in employee-jones 0 1(Duarte), (Estovar) 2(Duarte), (Estovar), (Barinsky), (Corbin) 3(Duarte), (Estovar), (Barinsky), (Corbin), (Alon) 4 manager relation employee-jones(X) :- manager(X, "Jones") employee-jones(X) :- manager(X,Y), employee-jones(Y)
Relational Operations in Datalog Relational Algebra: names = Π name (student) Datalog Rule: names(Name) :- student(Name,_) Relational Algebra: joeInfo = Π studentID (σ name="joe" (student)) Datalog Rule: joeInfo(StudentID) :- student(joe,StudentID) Relational Algebra: Π coursName (σ name="joe" (student |><| courses)) Datalog Rule: joeInfo(Name,Cname) :- student(joe, StudentID), courses(Cname, StudentID) student(Name,StudentID) courses(Cname, StudentID)
Examples Relational Algebra: joeInfo = Π studentID (σ name="joe", studentID > 25 (student)) Datalog Rule: joeInfo(StudentID) :- student(joe,StudentID), StudentID > 25 Relational Algebra: Π cName (courses) |><| Π name (student) Datalog Rule: joeInfo(Name,Cname) :- student(Name, StudentID), courses(Cname, StudentID) Relational Algebra: Π studentID (σ name "joe", studentID > 25 (student)) Datelog Rule: joeInfo(StudentID) :- student(Name,StudentID), StudentID > 25, Name <> joe