Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning Service Hui Shi, Kurt Maly, and Steven Zeil.

Slides:



Advertisements
Similar presentations
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Advertisements

Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
WIMS 2014, June 2-4Thessaloniki, Greece1 Optimized Backward Chaining Reasoning System for a Semantic Web Hui Shi, Kurt Maly, and Steven Zeil Contact:
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Store RDF Triples In A Scalable Way Liu Long & Liu Chunqiu.
Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)
Logic.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Dave Kolas, BBN Technologies Terra Cognita 08 Karlsruhe, Germany 10/26/08 1 Supporting Spatial Semantics with SPARQL.
WIMS 2011, Sogndal, Norway1 Comparison of Ontology Reasoning Systems Using Custom Rules Hui Shi, Kurt Maly, Steven Zeil, and Mohammad Zubair Contact:
Ameet N Chitnis, Abir Qasem and Jeff Heflin 11 November 2007.
Proof methods Proof methods divide into (roughly) two kinds: –Application of inference rules Legitimate (sound) generation of new sentences from old Proof.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Knoweldge Representation & Reasoning
Advanced Research Methodology
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
RDF Triple Stores Nipun Bhatia Department of Computer Science. Stanford University.
Database Support for Semantic Web Masoud Taghinezhad Omran Sharif University of Technology Computer Engineering Department Fall.
Ming Fang 6/12/2009. Outlines  Classical logics  Introduction to DL  Syntax of DL  Semantics of DL  KR in DL  Reasoning in DL  Applications.
Knowledge based Humans use heuristics a great deal in their problem solving. Of course, if the heuristic does fail, it is necessary for the problem solver.
IDB, SNU Dong-Hyuk Im Efficient Computing Deltas between RDF Models using RDFS Entailment Rules (working title)
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
1 Logical Agents CS 171/271 (Chapter 7) Some text and images in these slides were drawn from Russel & Norvig’s published material.
Logical Agents Logic Propositional Logic Summary
Comparison of BaseVISor, Jena and Jess Rule Engines Jakub Moskal, Northeastern University Chris Matheus, Vistology, Inc.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Review: Tree search Initialize the frontier using the starting state While the frontier is not empty – Choose a frontier node to expand according to search.
CS62S: Expert Systems Based on: The Engineering of Knowledge-based Systems: Theory and Practice, A. J. Gonzalez and D. D. Dankel.
1 Logical Agents CS 171/271 (Chapter 7) Some text and images in these slides were drawn from Russel & Norvig’s published material.
Bigscholar 2014, April 8, Seoul, South Korea1 Trust and Hybrid Reasoning for Ontological Knowledge Bases Hui Shi, Kurt Maly, and Steven Zeil Contact:
Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.
1 Logical Inference Algorithms CS 171/271 (Chapter 7, continued) Some text and images in these slides were drawn from Russel & Norvig’s published material.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
Inferencing in rule-based systems: forward and backward chaining.
Semantic Web Final Exam Review. Topics for Final Exam First exam material (~30%) Design Patterns and Map/Reduce (~20%) Inference / Restrictions (~10%)
© Copyright 2008 STI INNSBRUCK Intelligent Systems Propositional Logic.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
1/32 This Lecture Substitution model An example using the substitution model Designing recursive procedures Designing iterative procedures Proving that.
DEDUCTION PRINCIPLES AND STRATEGIES FOR SEMANTIC WEB Chain resolution and its fuzzyfication Dr. Hashim Habiballa University of Ostrava.
1 Propositional Logic Limits The expressive power of propositional logic is limited. The assumption is that everything can be expressed by simple facts.
Logical Agents Chapter 7. Outline Knowledge-based agents Propositional (Boolean) logic Equivalence, validity, satisfiability Inference rules and theorem.
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View Basic Concepts and Background.
Proof Methods for Propositional Logic CIS 391 – Intro to Artificial Intelligence.
Introduction and Preliminaries D Nagesh Kumar, IISc Water Resources Planning and Management: M4L1 Dynamic Programming and Applications.
Knowledge Repn. & Reasoning Lecture #9: Propositional Logic UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005.
Managing Large RDF Graphs Vaibhav Khadilkar Dr. Bhavani Thuraisingham Department of Computer Science, The University of Texas at Dallas December 2008.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
Rule-based Reasoning in Semantic Text Analysis
Review: Tree search Initialize the frontier using the starting state
Query Optimization Heuristic Optimization
EA C461 – Artificial Intelligence Logical Agent
Chapter 12: Query Processing
Objective of This Course
SAT-Based Area Recovery in Technology Mapping
This Lecture Substitution model
Artificial Intelligence
Lecture 2- Query Processing (continued)
Probabilistic Databases
This Lecture Substitution model
This Lecture Substitution model
Representations & Reasoning Systems (RRS) (2.2)
Presentation transcript:

Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning Service Hui Shi, Kurt Maly, and Steven Zeil Contact:

2 Outline Problem –What are we reasoning about? –What are the challenges? Background –Knowledge base using ontologies –Inference strategies –Query optimization methods –Benchmarks Dynamic Query Optimization with an Interposed Reasoner –Greedy Ordering –Deferral of joins Evaluation –Comparison against Jena Conclusions Service Computation 2013, Valencia, Spain

3 Problem Efficiency of reasoning in the face of large scale and frequent change within a question/answer system over a semantic web Issues –Query (conjunction of individual clauses) optimization over databases – well understood –Having reasoner -> uncertainty regarding the size of solution space associated with resolving individual clauses –Query optimization in the presence of such uncertainty Service Computation 2013, Valencia, Spain

4 Background Existing semantic application –Question/answer systems over domain of (researchers, publications, subjects) Knowledge base (KB) –Ontologies –Representation formalism: Description Logic (DL) Inference methods for First Order Logic –Materialization and forward chaining pre-computes inferred truths and starts with the known data suitable for frequent computation of answers with data that are relatively static Owlim and Oracle –Query-rewriting and backward chaining expands the queries and starts with goals suitable for efficient computation of answers with data that are dynamic and infrequent queries Virtuoso Service Computation 2013, Valencia, Spain

Background In conventional database management systems, query optimization –examines multiple query plans and –selects one that optimizes the time to answer a query In the Semantic Web, SPARQL optimization typically based on –selectivity estimations –graph optimization –cost models 5Service Computation 2013, Valencia, Spain

Background Benchmarks evaluate and compare the performances of different reasoning systems –The Lehigh University Benchmark (LUBM) –The University Ontology Benchmark (UOBM) 6Service Computation 2013, Valencia, Spain

Approach Dynamic Optimization with an Interposed Reasoner A greedy ordering of the proofs of the individual clauses according to estimated sizes anticipated for the proof results Deferring joins of results from individual clauses where such joins are likely to result in excessive combinatorial growth of the intermediate solution Service Computation 2013, Valencia, Spain 7

Greedy Ordering - Example Suppose there are 10,000 students, 500 courses, 50 faculty members and 10 departments in the knowledgebase and the query pattern is (?S takesCourse ?C) – What courses do students take? Estimate of response size –exploiting the fact that each pattern represents that application of a predicate with known domain and range types –accumulating statistics on typical response sizes for previously encountered patterns involving that predicate For the example an estimate might be100,000 if the average number of courses a student has taken is ten, although the number of possibilities is 500,000. 8Service Computation 2013, Valencia, Spain

Greedy Ordering - Example Let the query be: List all cases where any student took two courses from a specific faculty member We can represent this query as the sequence of the patterns in the following table 9Service Computation 2013, Valencia, Spain Clause # QueryPatternQuery Response Response Size 1 ?S1 takesCourse ?C1{(?S1=>s i,?C1=>c i )} i=1..100, ,000 2 ?S1 takesCourse ?C2{(?S1=>s j, ?C2=>c j )} j=1..100, ,000 3 ?C1 taughtBy fac1{(?C1=>c j )} j= ?C2 taughtBy fac1{(?C2=>c j )} j=1..3 3

Greedy Ordering - Example Storage requirement for joins: –Input size plus input size plus result size Processing complexity (using hashing to represent one set, then linear over other set): –Max(result size, input size) Service Computation 2013, Valencia, Spain 10

Greedy Ordering - Example Assume first that the patterns are processed in the order given Worst case (storage size) is join of clause 2, when the join of two sets of size 100,000 yields 1,000,000 tuples. 11Service Computation 2013, Valencia, Spain Clause Being Joined Clause Evaluation Resulting SolutionSpace SolutionSpace Size (initial) [ ]0 1 {(?S1=>s i,?C1=>c i )} i=1..100,000 [{(?S1=>s i, ?C1=>c i )} i=1..100,000 ]100,000 2 {(?S1=>s j, ?C2=>c j )} j=1..100,000 [{(?S1=>s i, ?C1=>c i, ?C2=>c i )} i=1..1,000,000 ] (based on an average of 10 courses / student) 1,000,000 3 {(?C1=>c j )} j=1..3 [{(?S1=>s i, ?C1=>c i, ?C2=>c i )} i= ] (Joining this clause discards courses taught by other faculty.) {(?C2=>c j )} j=1..3 [{(?S1=>s i, ?C1=>c i, ?C2=>c i )} i=1..60 ]60

Greedy Ordering Assume that the same patterns are processed in ascending order of estimated size, shown in the following table Worst case (storage size) is final addition of clause 2, when a set of size 100,000 is joined with a set of Service Computation 2013, Valencia, Spain Clause Being Joined Clause EvaluationResulting SolutionSpace SolutionSpace Size (initial) [ ]0 3 {(?C1=>c j )} j=1..3 [[{(?C1=>c i )} i=1..3 ]3 4 {(?C2=>c j )} j=1..3 [{(?C1=>c i, ?C2=>c i )} i=1..3, j=1..3 ]3 1 {(?S1=>s i,?C1=>c i )} i=1..100,000 [{(?S1=>s i, ?C1=>c i, ?C2=>c’ i )} i= ]270 2 {(?S1=>s j, ?C2=>c j )} j=1..100,000 [{(?S1=>s i, ?C1=>c i, ?C2=>c i )} i=1..60 ]60

Deferring joins - Example Suppose that we were processing the query: What mathematics courses are taken by computer science majors? Assume The Math department teaches 150 different courses, there are 1,000 students in the CS Dept, and there are 500 faculty overall with 50 in Math The Query is represented as the sequence of the following QueryPatterns 13 Service Computation 2013, Valencia, Spain ClauseQueryPatternQuery ResponseResponse Size 1 (?S1 takesCourse ?C1){(?S1=>s j,?C1=>c j )} j=1..100, ,000 2 (?S1 memberOf CSDept){(?S1=>s j )} j=1..1,000 1,000 3 (?C1 taughtby ?F1){(?C1=>c j, ?F1=>f j )} j=1..1,500 1,500 4 (?F1 worksFor MathDept){(?F1=>f i )} i=

Deferring joins - Example Assume the greedy ordering that we have already advocated all joins are done immediately The worst step in this trace is the final join, between sets of size 100,000 and 150, Service Computation 2013, Valencia, Spain Clause Being Joined Clause Evaluation Resulting SolutionSpace SolutionSpace Size (initial) []0 4 {(?F1=>f i )} i=1..50 [{(?F1=>f i )} i=1..50 ]50 2 {(?S1=>s j )} j=1..1,000 [{(?F1=>f i, ?S1=>s i )} i=1..50,000 ]50,000 3 {(?C1=>c j, ?F1=>f j )} j=1..1,500 [{(?F1=>f i, ?S1=>s i, ?C1=>c i )} i=1..150,000 ]150,000 1 {(?S1=>s j,?C1=>c j )} j=1..100,000 [{(?F1=>f i, ?S1=>s i, ?C1=>c i )} i=1..1,000 ]1,000

Deferring joins - Example Joins be carried out immediately only if the input QueryResponses share at least one variable, otherwise defer the join Replace the input QueryResponse set in the solution space with the result of the join The worst join performed would have been between sets of size 100,000 and 150, a considerable improvement over the non- deferred case. 15Service Computation 2013, Valencia, Spain Clause Being Joined Query ResponseResulting SolutionSpace SolutionSpace Size (initial) []0 4 {(?F1=>f i )} i=1..50 [{(?F1=>f i )} i=1..50 ]50 2 {(?S1=>s j )} j=1..1,000 [{(?F1=>f i )} i=1..50,{(?S1=>s j )} j=1..1,000 ]( ) 3 {(?C1=>c j, ?F1=>f j )} j=1..1,500 [{(?F1=>f i, ?C1=>c i )} i=1..150, {(?S1=>s j )} j=1..1,000 ]( ) 1 {(?S1=>s j,?C1=>c j )} j=1..100,000 [{(?F1=>f i, ?S1=>s i, ?C1=>c i )} i=1..1,000 ]1000

Evaluation Compare our algorithm against Jena (in-memory, backward- chaining reasoner, limited capabilities to handle some OWL semantic rules, hence only used RDFS semantics) Using LUBM benchmarks representing a knowledge base: – describing a single university – ~100,000 triples – describing10 universities – ~1,000,000 triples Using a set of 14 queries taken from LUBM, requiring reasoning over rules associated with either –both RDFS and OWL semantics, –purely on the basis of the RDFS rules. Comparison metrics is response time Response size is used for Sanity check on correctness of results Indicator of complexity of reasoning 16Service Computation 2013, Valencia, Spain

Evaluation Comparison against Jena with Backward Chaining 17Service Computation 2013, Valencia, Spain LUBM:1 University, 100,839 triples10 Universities, 1,272,871 triples answerAQueryJena BackwdanswerAQueryJena Backwd response time result size response time result size response time result size response time result size Query Query n/a Query Query Query Query , , , ,507 Query ,10061 Query , , ,790526,463 Query n/a 2.5 2,540n/a Query Query Query Query Query , , , ,547

Evaluation Our algorithm generally is faster than Jena, sometimes by multiple orders of magnitude. Exceptions queries with very small result set sizes or queries 10-13, which rely upon OWL semantics and so could not be answered correctly by Jena. In two queries (2 and 9), Jena timed out. 18Service Computation 2013, Valencia, Spain

Evaluation Comparison against Jena with Hybrid reasoner 19Service Computation 2013, Valencia, Spain LUBM1 University, 100,839 triples10 Universities, 1,272,871 triples answerAQueryJena HybridanswerAQueryJena Hybrid response time result size response time result size response time result size response time result size Query Query , n/a Query Query Query Query , , , ,507 Query n/a Query , , ,790n/a Query n/a 2.5 2,540n/a Query Query Query Query Query , , , ,547

Evaluation Jena Hybrid means that Jena materializes some rules starts with longer list of tuples. avoiding combinatorial explosions through deferral even more important The times here tend to be somewhat closer, but the Jena system has even more difficulties returning any answer at all when working with the larger benchmark. 20Service Computation 2013, Valencia, Spain

21 Conclusions We reported on our efforts to use backward-chaining reasoners to accommodate the changing knowledge base. We developed a query-optimization algorithm that will work with a reasoner interposed between the knowledge base and the query interpreter. We compared our implementation with traditional backward- chaining reasoners and found, that our implementation –could handle much larger knowledge bases –with more complete rule sets (OWL Horst) –is faster Service Computation 2013, Valencia, Spain

Future Work We will address the issue of being able to scale the knowledgebase to the level forward-chaining reasoners can handle We will be working on creating a hybrid reasoner that will combine the best of forward-chaining and backward- chaining 22Service Computation 2013, Valencia, Spain

Answering a query 23Service Computation 2013, Valencia, Spain QueryResponseanswerAQuery(query: Query) { // Set up initial SolutionSpace SolutionSpacesolutionSpace = empty;  // Repeatedly reduce SolutionSpace by applying // the most restrictive pattern while (unexplored patterns remain in the query) { computeEstimatesOfReponseSize (unexplored patterns);  QueryPattern p = unexplored pattern withsmallest estimate;  // Restrict SolutionSpace via // exploration of p QueryResponseanswerToP = BackwardChain(p);  solutionSpace.restrictTo (answerToP);  } return solutionSpace.finalJoin();  }