Interactive repairing of inconsistent knowledge bases

Slides:



Advertisements
Similar presentations
From Handbook of Temporal Reasoning in Artificial Intelligence By Jan Chomicki & David Toman Temporal Databases Presented by Leila Jalali CS224 presentation.
Advertisements

Computer Science CPSC 322 Lecture 25 Top Down Proof Procedure (Ch 5.2.2)
S-Match: an Algorithm and an Implementation of Semantic Matching Pavel Shvaiko 1 st European Semantic Web Symposium, 11 May 2004, Crete, Greece paper with.
Justification-based TMSs (JTMS) JTMS utilizes 3 types of nodes, where each node is associated with an assertion: 1.Premises. Their justifications (provided.
Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)
Methods of Proof Chapter 7, Part II. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound) generation.
Logic.
The International RuleML Symposium on Rule Interchange and Applications Local and Distributed Defeasible Reasoning in Multi-Context Systems Antonis Bikakis,
Rule Based Systems Michael J. Watts
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Proof methods Proof methods divide into (roughly) two kinds: –Application of inference rules Legitimate (sound) generation of new sentences from old Proof.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Logical Agents Chapter 7. Why Do We Need Logic? Problem-solving agents were very inflexible: hard code every possible state. Search is almost always exponential.
1 Chapter 9 Rules and Expert Systems. 2 Chapter 9 Contents (1) l Rules for Knowledge Representation l Rule Based Production Systems l Forward Chaining.
Methods of Proof Chapter 7, second half.
THE MODEL OF ASIS FOR PROCESS CONTROL APPLICATIONS P.Andreeva, T.Atanasova, J.Zaprianov Institute of Control and System Researches Topic Area: 12. Intelligent.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Web Explanations for Semantic Heterogeneity Discovery Pavel Shvaiko 2 nd European Semantic Web Conference (ESWC), 1 June 2005, Crete, Greece work in collaboration.
An approach to Intelligent Information Fusion in Sensor Saturated Urban Environments Charalampos Doulaverakis Centre for Research and Technology Hellas.
Introduction to ASMs Dumitru Roman Digital Enterprise Research Institute
DATA-DRIVEN UNDERSTANDING AND REFINEMENT OF SCHEMA MAPPINGS Data Integration and Service Computing ITCS 6010.
Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.
1 Logical Agents CS 171/271 (Chapter 7) Some text and images in these slides were drawn from Russel & Norvig’s published material.
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
1 Logical Agents CS 171/271 (Chapter 7) Some text and images in these slides were drawn from Russel & Norvig’s published material.
S calable K nowledge C omposition Ontology Interoperation January 19, 1999 Jan Jannink, Prasenjit Mitra, Srinivasan Pichai, Danladi Verheijen, Gio Wiederhold.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
© Copyright 2008 STI INNSBRUCK Intelligent Systems Propositional Logic.
Reasoning about the Behavior of Semantic Web Services with Concurrent Transaction Logic Presented By Dumitru Roman, Michael Kifer University of Innsbruk,
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
ece 627 intelligent web: ontology and beyond
1 Propositional Logic Limits The expressive power of propositional logic is limited. The assumption is that everything can be expressed by simple facts.
1 Ontology Evolution within Ontology Editors Presentation at EKAW, Sigüenza, October 2002 L. Stojanovic, B. Motik FZI Research Center for Information Technologies.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Artificial Intelligence Logical Agents Chapter 7.
Mechanisms for Requirements Driven Component Selection and Design Automation 최경석.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Logical Database Design and the Rational Model
Service-Oriented Computing: Semantics, Processes, Agents
By P. S. Suryateja Asst. Professor, CSE Vishnu Institute of Technology
Database Management.
University of Montpellier, France.
Normalization Karolina muszyńska
Semantic Parsing for Question Answering
Database Management System
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Relational Database Design by Dr. S. Sridhar, Ph. D
Jie Bao, Doina Caragea and Vasant G Honavar
Lecture 2 The Relational Model
Computing Full Disjunctions
Ontology Evolution: A Methodological Overview
ece 720 intelligent web: ontology and beyond
Bo Wang1, Yingfei Xiong2, Zhenjiang Hu3, Haiyan Zhao1,
Service-Oriented Computing: Semantics, Processes, Agents
SAT-Based Area Recovery in Technology Mapping
Logic: Top-down proof procedure and Datalog
Service-Oriented Computing: Semantics, Processes, Agents
Ontology-Based Approaches to Data Integration
Logic: Domain Modeling /Proofs + Computer Science cpsc322, Lecture 22
Symbolic Characterization of Heap Abstractions
Interactive Proofs Adapted from Oded Goldreich’s course lecture notes.
Chen Li Information and Computer Science
Views 1.
Methods of Proof Chapter 7, second half.
ONTOMERGE Ontology translations by merging ontologies Paper: Ontology Translation on the Semantic Web by Dejing Dou, Drew McDermott and Peishen Qi 2003.
Representations & Reasoning Systems (RRS) (2.2)
A handbook on validation methodology. Metrics.
Detecting Data Errors: Where are we and what needs to be done?
Presentation transcript:

Interactive repairing of inconsistent knowledge bases DaQuaTa International Workshop Lyon - France 12/12/2017 Interactive repairing of inconsistent knowledge bases Abdallah Arioua Angela Bonifati University of Lyon 1 / LIRIS France.

1 Motivation Interactive repairing of knowledge bases Motivation Interactive repairing of knowledge bases Knowledge Bases are ubiquitous: The Semantic Web, ontology-based reasoning and data access. Big data integration and fusion. Knowledge and concept graphs in industry Errors can be introduced from mappings, typos, knowledge fusion, etc. Automatic repairing is costly and lossy. Bring human to the loop for a better quality. The democratization of data cleaning. A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 1/20

1 Preliminaries Logical language and knowledge bases Preliminaries Logical language and knowledge bases A knowledge base K is a set of facts, TGDs (rules) and CDDs (constraints). Formalism Weakly-acyclic Correspond to Denial Constraints with equality 2/20

1 Preliminaries Logical language and knowledge bases Preliminaries Logical language and knowledge bases A knowledge base K is a set of facts, TGDs (rules) and CDDs (constraints). Reasoning Formalism 2/20

1 Preliminaries Logical language and knowledge bases Preliminaries Logical language and knowledge bases A knowledge base K is a set of facts, TGDs (rules) and CDDs (constraints). Reasoning Formalism Example: 2/20

1 Preliminaries Inconsistency handling Preliminaries Inconsistency handling A knowledge base K is inconsistent iff: K is inconsistent: Conflicts discovery Repairing K using Deletions: Remove facts that are involved in conflicts. Lossy approach. Repairing K using Updates: E.g. John has an allergy against Penicillin rather than Aspirin Penicillin is prescribed to John rather than Aspirin … User interaction 3/20

O utline Update-based repairing User intervention Given a KB equipped with a set of TGDs and CDDs, produce an error-free KB: Accounting for the interplay of TGDs and CDDs. Minimizing user interaction. utline O Update-based repairing User intervention Questioning strategies Experimental study Conclusion A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database.

Update-based repairing 2 Update-based repairing Introduction Repairing using updates in KBs and RDBs1 On Functional Dependencies (FDs), Conditional FDs, Denial Constraints (DCs). Repairing using updates in KBs: TGDs are natural in KB reasoning but they may introduce new conflicts. 1. Apply TGDs then 2. apply CDDs then go to 1. and repeat. + TGDs CDDs … Computationally expensive, overwhelming and some finiteness issues. A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 4/20 1 Philip Bohannon et al. 2005, Kolahi and Lakshmanan 2009, Yakout et al. 2011, Xu Chu et al. 2013, Farid et al. 2016

Update-based repairing 2 Update-based repairing Basic concepts A set of fixes P is called consistent fix if it produces a consistent KB. P is a repair fix if KB is consistent and minimally changed (w.r.t set inclusion). Example: A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 5/20

Update-based repairing 2 Update-based repairing ∏-Repairability Repairing with immutable set of positions ¦. Trusted positions or previously fixed positions. Some KBs cannot be fixed when some positions are immutable. Example: Consider: Not p-repairable Checking ¦-repairability: Change all non-immutable positions to unique labelled nulls. Check consistency. The procedure is sound, complete and computed in linear time (data complexity). Inconsistent A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 6/20

3 User intervention Basic definitions Example: User intervention Basic definitions A question © is a finite set of fixes. If all the fixes in © yield a ¦-repairable KB then © is sound. The user choses a fix from © as an answer. A sequence of sound questions and answers is called an inquiry over K. ©: which fix is true from the following set? Example: A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 7/20

3 User intervention Generating sound questions and inquiries User intervention Generating sound questions and inquiries Procedure: Generate a sound question by filtering values. Ask the user and update, continue until no conflict is left. A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 8/20

3 User intervention Generating sound questions and inquiries User intervention Generating sound questions and inquiries Procedure: Generate a sound question by filtering values. Ask the user and update, continue until no conflict is left. A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 8/20

3 User intervention Results User intervention Results Proposition 1: questions are sound and polynomial. Proposition 2: the procedure runs in finite time and produces a consistent knowledge base K. When the procedure produces a repair of K? If the user is an oracle then K is minimally repaired. An oracle is a user who knows everything about K (a domain and knowledge expert). Proposition 3: delay time between questions is polynomial. A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 9/20

Questioning strategies 4 Questioning strategies Intuition by examples Consider the following knowledge base: Conflicts Random strategy Pick a conflict and generate all possible fixes. No resolution if the user chooses: We ask more questions. Join positions! 10/20

Questioning strategies 4 Questioning strategies Intuition by examples Consider the following knowledge base: Conflicts Join strategy Pick a conflict and generate all possible fixes over join positions. We ask less questions 11/20

Questioning strategies 4 Questioning strategies Intuition by examples Consider the following knowledge base: Conflicts Join strategy Pick a conflict and generate all possible fixes over join positions. Reduce the size of the next questions. Prune some questions. We ask less questions Improvement: propagate fixes. 11/20

Questioning strategies 4 Questioning strategies Intuition by examples Consider the following knowledge base: Conflicts MCD strategy Rank join positions w.r.t inclusion in conflicts and choose the top ranked. 2 conflicts by one question in the example. Observation: more overlapping less questions 12/20

5 Experimental study Experimental environment Variables: Experimental study Experimental environment Variables: Effectiveness: avg number of questions per strategy and average number of conflicts per question. Delay time: average delay time between asked question. Environment: Java 8, 2.40GHz 4core, 16G RAM (windows 7). Multi trial experiments with a cold start. For each experimental variable we test our approach on synthetic and real-world datasets. 13/20

5 Experimental study Effectiveness KBs: Experimental study Effectiveness KBs: Durum Wheat Kb v1: manually constructed. TGDs and CDDs have been validated by experts. Summary v1: 567 atoms, TGDs=269 , CDD=27, 185 conflicts. Summary v2: 567 atoms, TGDs=269, CDD=100, 212 conflicts. Results: 14/20

5 Experimental study Effectiveness: synthetic KBs no TGDs Results: 5 Experimental study Effectiveness: synthetic KBs no TGDs Results: 15/20

5 Experimental study Effectiveness: synthetic KBs (convergence) 5 Experimental study Effectiveness: synthetic KBs (convergence) Results: 16/20

Experimental study Effectiveness: synthetic KBs (convergence) Results:

5 Experimental study Delay time: synthetic KBs only CDDs Experimental study Delay time: synthetic KBs only CDDs Reasonable delay time: less than 1 to 2 seconds1. MCD strategy is used. Drum wheat v1&2 less than 1 sec. Results: 17/20 1 Robert B Miller, Response time in man-computer conversational transactions, 1968.

5 Experimental study Delay time: synthetic KBs CDDs and TGDs 5 Experimental study Delay time: synthetic KBs CDDs and TGDs Reasonable delay time: less than 1 to 2 seconds1. MCD strategy is used. Drum wheat v1&2 less than 1 sec. Results: TGDs CDDs D1 50 150 D2 100 D3 D4 200 Size 400 atoms Inc ratio 100% 18/20 1 Robert B Miller, Response time in man-computer conversational transactions, 1968.

6 Conclusion Update-based repairing of inconsistent knowledge bases. Conclusion Summary: Update-based repairing of inconsistent knowledge bases. Interactive repairing in presence of interacting dependencies. Strategies for interaction minimization. Approach can be applied on portions of large knowledge bases. Delay time is reasonable. Perspectives: Full Denial Constraints (but undecidability!). Other Data Cleaning constraints (CFDs, Metric FDs etc.). 19/20

End! Thank you! Questions 20/20