WIMS 2014, June 2-4Thessaloniki, Greece1 Optimized Backward Chaining Reasoning System for a Semantic Web Hui Shi, Kurt Maly, and Steven Zeil Contact:
2 Outline Problem –Semantic web subject to changes –How to scale a reasoner to big data? Background –Knowledge base using ontologies –Inference strategies –Benchmarks –Query optimization Integrated optimized backward chaining –Selection function –Switching resolution methods –Avoidance of non-termination – OLDT –Owl:sameAs optimization Evaluation Conclusions WIMS 2014, June 2-4Thessaloniki, Greece
3 Problem Efficiency of reasoning in the face of large scale and frequent changes within a question/answer system over a semantic web Issue –Forward chaining scales well for fixed knowledge bases –Backward chaining can handle changes in knowledge base but does not scale WIMS 2014, June 2-4Thessaloniki, Greece
Background Existing semantic application: question/answer systems –Libra, Cimple, Arnetminer Semantic Web –Resource Description Framework(RDF) –Web Ontology Language (OWL) for specific knowledge domains –SPARQL query language for RDF –SWRL rule language Reasoning systems –Jena proprietary Jena rules –Pellet and KANON –ORACLE 11g –OWLIM WIMS 2014, June 2-4Thessaloniki, Greece 4
5 Background Knowledge base (KB) –Ontologies –Representation formalism: Description Logic (DL) Inference methods for First Order Logic –Materialization and forward chaining pre-computes inferred truths and starts with the known data suitable for frequent computation of answers with data that are relatively static Owlim and Oracle –Query-rewriting and backward chaining expands the queries and starts with goals suitable for efficient computation of answers with data that are dynamic and infrequent queries Virtuoso WIMS 2014, June 2-4Thessaloniki, Greece
Background Benchmarks evaluate and compare the performances of different reasoning systems –The Lehigh University Benchmark (LUBM) –The University Ontology Benchmark (UOBM) 6WIMS 2014, June 2-4Thessaloniki, Greece
Background Query optimization – issues –Query (conjunction of individual clauses) optimization over databases – well understood –Having reasoner -> uncertainty regarding the size of solution space associated with resolving individual clauses –Query optimization in the presence of such uncertainty Dynamic Optimization with an Interposed Reasoner A greedy ordering of the proofs of the individual clauses according to estimated sizes anticipated for the proof results Deferring joins of results from individual clauses where such joins are likely to result in excessive combinatorial growth of the intermediate solution WIMS 2014, June 2-4Thessaloniki, Greece 7
Hybrid reasoner Motivation example Assume fully materialized KB Harvester adds new fact: student0 enrolled course0 Query ‘Who is enrolled in course 0?’ ok Assume fact Porf0 teaches course0 in KB Query “Who is being taught by Prof0?” not ok as simple lookup; needs reasoning with rule such as: enrolledIn(?Student,?Course?), teaches(?Faculty,?Course) :- isTaughtBy(?Student,?faculty) WIMS 2014, June 2-4Thessaloniki, Greece 8
Optimized Backward Chaining Problem –Generate a query response for a given query pattern based on a specific rule set (RDFS, Horst, custom) Four Optimizations –Ordered Selection Function –Switching between Binding Propagation and Free Variable Resolution –Avoid Repetition and Non-Termination (OLDT) –owl:sameAs Optimization WIMS 2014, June 2-4Thessaloniki, Greece9
Dynamic Selection of Propagation Mode Suppose that: –we have a rule body containing clauses (?x p1 ?y) and (?y p2 ?z) –we have already proven that the first clause can be satisfied using value pairs {(x 1, y 1 ), (x 2,y 2 ),…(x n,y n )}. WIMS 2014, June 2-4Thessaloniki, Greece10
Dynamic Selection of Propagation Mode Binding propagation mode –the bindings from the earlier solutions are substituted into the upcoming clause to yield multiple instances of that clause as goals for subsequent proof –(y 1 p2 ?z), (y 2 p2 ?z), …, (y n p2 ?z) Free variable resolution mode –a single proof is attempted of the upcoming clause in its original form, with no restriction upon the free variables in that clause –(?y p2 ?z) WIMS 2014, June 2-4Thessaloniki, Greece11
Dynamic Selection of Propagation Mode: Example Suppose we have an earlier body clause 1: “?y type Course” and a subsequent body clause 2: “?x takesCourse ?y”. –1.749 seconds to prove body clause 1 –average of seconds to prove body clause 2 for a given value of ?y from the proof of body clause 1. –86,361 students satisfying variable ?x –0.235 *86,361=20,295 seconds with binding propagation –2.612 seconds to resolve the second clause in free variable resolution WIMS 2014, June 2-4Thessaloniki, Greece12
Dynamic Selection of Propagation Mode –Dynamically switch between modes based upon the size of the partial solutions obtained Let n denote the number of solutions that satisfy an already proven clause Let t denote the threshold used to dynamically select between modes If n≤t, then the binding propagation mode will be selected If n>t, then the free variable resolution mode will be selected The larger the threshold is, the more likely binding propagation mode will be selected. WIMS 2014, June 2-4Thessaloniki, Greece13
Calculation of Threshold t –Let join 1 denote the time spent on the join operations in binding propagation mode –Let join 2 denote the time spent on the join operations in free variable resolution mode –Let proof 1 i denote the time of proving first clause with i free variables and proof 2 j be the average time of proving new specialized form with j free variables. (i ∈ [1,3], j ∈ [0,2]) –Let proof 3 k denote the time of proving second clause with k free variables (k ∈ [1,3]) Compare the time spent on binding propagation mode and free variable resolution mode to determine t. Binding propagation is favored when proof 1 i + proof 2 j * n + join 1 < proof 1 i + proof 3 k + join 2 t = floor(proof 3 k / proof 2 j ) WIMS 2014, June 2-4Thessaloniki, Greece 14
Calculation of Threshold t To estimate proof 3 k and proof 2 j –we record the time spent on proving goals with different numbers of free variables –after we have recorded a sufficient number of proof times,we compute the average time spent on goals with k free variables and j free variables respectively Start with historical default value Update the threshold several times when answering a particular query WIMS 2014, June 2-4Thessaloniki, Greece15
Evaluation Time (ms), Dynamic selection Time (ms), Binding propagation only Time (ms), Free variable resolution only Query Query21,0601,34121,278 Query Query ,572 Query ,323 Query61,170592,94419,968 Query71,341551,82220,217 Query81,684513,77340,061 Query91,591524,78720,841 Query ,07819,734 Query ,141 Query ,313 Query ,528 Query WIMS 2014, June 2-4Thessaloniki, Greece 16
Overall Performance LUBM(1)LUBM(40) Time (ms), Opt. Backwd Time (ms), OWLIM -SE Time (ms), Opt. Backwd Time (ms), OWLIM -SE Loading time 2,9009,60095,000350,000 Query ,40026 Query ,1005,100 Query Query ,90014 Query Query ,0005,300 Query ,00054 Query ,0003,000 Query ,0004,400 Query , Query Query ,60011 Query Query ,2002,500 WIMS 2014, June 2-4Thessaloniki, Greece 17 LUBM(1) = 100,839 LUBM(40) = 5,307,754
18 Conclusions We have developed optimizations for a backward chaining algorithm New optimized algorithm outperformed one of the best forward-chaining reasoner in scenarios where the knowledge base is subject to frequent change WIMS 2014, June 2-4Thessaloniki, Greece