Presentation is loading. Please wait.

Presentation is loading. Please wait.

Proper Refinement of Datalog Clauses using Primary Keys

Similar presentations


Presentation on theme: "Proper Refinement of Datalog Clauses using Primary Keys"— Presentation transcript:

1 Proper Refinement of Datalog Clauses using Primary Keys
Siegfried Nijssen and Joost N. Kok BNAIC-2003, Nijmegen

2 Introduction Inductive Logic Programming algorithm:
C: Set of Datalog clauses, initially empty D: Database of facts (Knowledge base) repeat make clauses in C more specific (downward refinement) evaluate C against D In this presentation, I will present some general ideas that can be applied to any ILP algorithm which has this form. I will first discuss how “traditional” algorithms perform the second step, the evaluation, and then how many classical algorithms perform refinement. After that, I will present my ideas for these steps. 1. 2. October 24, 2003, Nijmegen BNAIC-2003

3 Database of Facts g1 g2 {e(g1,n1,n2,a),e(g1,n2,n1,a),e(g1,n2,n3,a), e(g1,n3,n1,b),e(g1,n3,n4,b),e(g1,n3,n5,c), e(g2,n6,n7,b)} n2 n4 n6 a a b b a n1 n5 n7 b c n3 October 24, 2003, Nijmegen BNAIC-2003

4 Clause b a k(G)  e(G,N1,N2,a),e(G,N2,N3,a), e(G,N1,N4,a),e(G,N4,N5,b)
October 24, 2003, Nijmegen BNAIC-2003

5 Evaluation of a clause -subsumption: D C iff there is a substitution , (C)  D Database D: {e(g1,n1,n2,a),e(g1,n2,n1,a),e(g1,n2,n3,a), e(g1,n3,n1,b),e(g1,n3,n4,b),e(g1,n3,n5,c), e(g2,n6,n7,b)} Clause C: k(G)  e(G,N1,N2,a),e(G,N2,N3,a), e(G,N1,N4,a),e(G,N4,N5,b) ={G/g1,N1/n2,N2/n1,N3/n2,N4/n3,N5/n1} The first step to determine the frequency of a query, is to determine a subsumption operator. The most well-known subsumption operator is -subsumption. October 24, 2003, Nijmegen BNAIC-2003

6 Evaluation of a clause a b a a b b b a b a a b b a g1 g2 n2 n4 n6 n1
October 24, 2003, Nijmegen BNAIC-2003

7 Evaluation of a clause a a b b b a b a a a Equivalent g1 g2 n2 n4 n6
October 24, 2003, Nijmegen BNAIC-2003

8 Evaluation of a clause k(G)  e(G,N1,N2,a) k(G)  e(G,N1,N2,a),
Equivalent N3 October 24, 2003, Nijmegen BNAIC-2003

9 Evaluation of a clause OI-subsumption: D C iff there is a substitution , (C)  D, while:  is injective  does not map to constants in C N3 a N1 a N2 N1 a N2 October 24, 2003, Nijmegen BNAIC-2003

10 Clause Refinement - modes
User defined Refinement using modes [Progol,Aleph, Warmr,Tilde,Farmer] T={k(G),e(G,N,N,L)} M={e(+,-,-,#),e(+,+,-,#)} k(G) + old variable - new variable # constant e(G,N1,N2,a) a ,e(G,N2,N1,a) b Using M only edge labeled trees can be constructed! ,e(G,N1,N3,a) a October 24, 2003, Nijmegen BNAIC-2003

11 Clause Refinement - modes
k(G) e(G,N1,N2,a) k(G) e(G,N1,N2,a),e(G,N1,N3,a) k(G) e(G,N1,N2,a),e(G,N1,N3,a), e(G,N2,N4,a),e(G,N3,N5,b) Complete & proper refinement is possible with OI-subsumption, not with -subsumption. a b a a October 24, 2003, Nijmegen BNAIC-2003

12 Refinement using Primary Keys
Assume we know: between a pair of nodes there is at most one edge with one label How to incorporate this knowledge in the refinement operator? M={e(+,-,-,#),e(+,+,-,#),e(+,+,+,#)} These modes allow: k(G)  e(G,N1,N2,a), e(G,N1,N2,b) k(G)  e(G,N1,N2,a), e(G,N1,N2,L1) Primary key: {1,2,3} (first 3 arguments of e) October 24, 2003, Nijmegen BNAIC-2003

13 Expressiveness OI vs -subsumption
a ? b k(G) e(G,N1,N2,a),e(G,N2,N3,L),e(G,N3,N4,b) For proper and complete refinement: OI is required Under OI: L  a, L  b Assume that, FOR THE PREVIOUS DATABASE WITH LABELS ON THE EDGES, you would like to express the following pattern. October 24, 2003, Nijmegen BNAIC-2003

14 In an ideal situation... We have complete & proper refinement
We are not required to use OI for all types (weak Object Identity) October 24, 2003, Nijmegen BNAIC-2003

15 Proper refinement using Primary Keys
In many cases, this ideal situation exists for refinement using primary keys! k(G) e(G,N1,N2,a),e(G,N2,N3,L),e(G,N3,N4,b) each atom must differ from every other atom in at least one literal of the primary key if this clause is equivalent with a smaller clause, there must be substitution that maps a variable to another variable/constant in the clause no substitution exists for L: primary key differs October 24, 2003, Nijmegen BNAIC-2003

16 Proper refinement using Primary Keys
T={k(G),e(G,N,N,L),t(L,C)} M={e(+,-,-,-),e(+,+,-,-),t(+,#)} K(p)={1,2,3} K(t)={1,2} OI={G,N} k(G) e(G,N1,N2,L1),t(L1,a), e(G,N1,N3,L2),t(L2,a) Now consider this example, in which we have added a predicate t. There is one mode for this predicate, and all its arguments are part of the same primary key. Consider the following clause, which is part of the search space. Can it be equivalent to a shorter clause? This can only be the case if L1 is unified with another variable or with a constant. However, if we unify L1 with L2, two e atoms have the same last arguments, which is not possible for the given modes, as each mode for e defines that the last argument must be new. This clause can therefore not be equivalent with a smaller clause in the search space. October 24, 2003, Nijmegen BNAIC-2003

17 Proper refinement using Primary Keys
Given predicates, types, modes, primary keys and a partition of types into OI and non-OI We prove refinement is proper and complete if for every mode there is a primary key which does not include any non-OI output. October 24, 2003, Nijmegen BNAIC-2003

18 Conclusions Higher performance for ILP algorithms Higher flexibility
primary keys restrict the search space efficiently refinement is proper Higher flexibility weak OI is more flexible than full OI October 24, 2003, Nijmegen BNAIC-2003

19 Clause refinement - other representation better?
Background knowledge: a(G,N1,N2)  e(G,N1,N2,a) b(G,N1,N2)  e(G,N1,N2,b) e(G,N1,N2)  e(G,N1,N2,L) k(G) a(G,N1,N2),e(G,N2,N3),b(G,N3,N4) k(G) a(G,N1,N2),e(G,N1,N2) October 24, 2003, Nijmegen BNAIC-2003


Download ppt "Proper Refinement of Datalog Clauses using Primary Keys"

Similar presentations


Ads by Google