Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interactive repairing of inconsistent knowledge bases

Similar presentations


Presentation on theme: "Interactive repairing of inconsistent knowledge bases"— Presentation transcript:

1 Interactive repairing of inconsistent knowledge bases
DaQuaTa International Workshop Lyon - France 12/12/2017 Interactive repairing of inconsistent knowledge bases Abdallah Arioua Angela Bonifati University of Lyon 1 / LIRIS France.

2 1 Motivation Interactive repairing of knowledge bases
Motivation Interactive repairing of knowledge bases Knowledge Bases are ubiquitous: The Semantic Web, ontology-based reasoning and data access. Big data integration and fusion. Knowledge and concept graphs in industry Errors can be introduced from mappings, typos, knowledge fusion, etc. Automatic repairing is costly and lossy. Bring human to the loop for a better quality. The democratization of data cleaning. A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 1/20

3 1 Preliminaries Logical language and knowledge bases
Preliminaries Logical language and knowledge bases A knowledge base K is a set of facts, TGDs (rules) and CDDs (constraints). Formalism Weakly-acyclic Correspond to Denial Constraints with equality 2/20

4 1 Preliminaries Logical language and knowledge bases
Preliminaries Logical language and knowledge bases A knowledge base K is a set of facts, TGDs (rules) and CDDs (constraints). Reasoning Formalism 2/20

5 1 Preliminaries Logical language and knowledge bases
Preliminaries Logical language and knowledge bases A knowledge base K is a set of facts, TGDs (rules) and CDDs (constraints). Reasoning Formalism Example: 2/20

6 1 Preliminaries Inconsistency handling
Preliminaries Inconsistency handling A knowledge base K is inconsistent iff: K is inconsistent: Conflicts discovery Repairing K using Deletions: Remove facts that are involved in conflicts. Lossy approach. Repairing K using Updates: E.g. John has an allergy against Penicillin rather than Aspirin Penicillin is prescribed to John rather than Aspirin User interaction 3/20

7 O utline Update-based repairing User intervention
Given a KB equipped with a set of TGDs and CDDs, produce an error-free KB: Accounting for the interplay of TGDs and CDDs. Minimizing user interaction. utline O Update-based repairing User intervention Questioning strategies Experimental study Conclusion A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database.

8 Update-based repairing
2 Update-based repairing Introduction Repairing using updates in KBs and RDBs1 On Functional Dependencies (FDs), Conditional FDs, Denial Constraints (DCs). Repairing using updates in KBs: TGDs are natural in KB reasoning but they may introduce new conflicts. 1. Apply TGDs then 2. apply CDDs then go to 1. and repeat. + TGDs CDDs Computationally expensive, overwhelming and some finiteness issues. A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 4/20 1 Philip Bohannon et al. 2005, Kolahi and Lakshmanan 2009, Yakout et al. 2011, Xu Chu et al. 2013, Farid et al. 2016

9 Update-based repairing
2 Update-based repairing Basic concepts A set of fixes P is called consistent fix if it produces a consistent KB. P is a repair fix if KB is consistent and minimally changed (w.r.t set inclusion). Example: A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 5/20

10 Update-based repairing
2 Update-based repairing ∏-Repairability Repairing with immutable set of positions ¦. Trusted positions or previously fixed positions. Some KBs cannot be fixed when some positions are immutable. Example: Consider: Not p-repairable Checking ¦-repairability: Change all non-immutable positions to unique labelled nulls. Check consistency. The procedure is sound, complete and computed in linear time (data complexity). Inconsistent A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 6/20

11 3 User intervention Basic definitions Example:
User intervention Basic definitions A question © is a finite set of fixes. If all the fixes in © yield a ¦-repairable KB then © is sound. The user choses a fix from © as an answer. A sequence of sound questions and answers is called an inquiry over K. ©: which fix is true from the following set? Example: A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 7/20

12 3 User intervention Generating sound questions and inquiries
User intervention Generating sound questions and inquiries Procedure: Generate a sound question by filtering values. Ask the user and update, continue until no conflict is left. A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 8/20

13 3 User intervention Generating sound questions and inquiries
User intervention Generating sound questions and inquiries Procedure: Generate a sound question by filtering values. Ask the user and update, continue until no conflict is left. A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 8/20

14 3 User intervention Results
User intervention Results Proposition 1: questions are sound and polynomial. Proposition 2: the procedure runs in finite time and produces a consistent knowledge base K. When the procedure produces a repair of K? If the user is an oracle then K is minimally repaired. An oracle is a user who knows everything about K (a domain and knowledge expert). Proposition 3: delay time between questions is polynomial. A new paradigm that seeks to exploit knowledge, typically domain knowledge, when querying data. More precisely, instead of a database, we consider a knowledge base which is composed of data and of an ontology, and we want to query the data while taking into account inferences enabled by the ontology. More generally, adding an ontological layer on top of data has at least three kinds of well-acknowledged advantages: Ontologies can be used to enrich the vocabulary of data sources, thereby allowing users to formulate their queries in a richer, more familiar vocabulary which abstracts from the specific way data is stored. By allowing inference of new facts, ontologies allow for incomplete data. Data incompleteness may come from a lack of information but it may also be deliberate because one does not want to explicitely store all details about objects in the database. 9/20

15 Questioning strategies
4 Questioning strategies Intuition by examples Consider the following knowledge base: Conflicts Random strategy Pick a conflict and generate all possible fixes. No resolution if the user chooses: We ask more questions. Join positions! 10/20

16 Questioning strategies
4 Questioning strategies Intuition by examples Consider the following knowledge base: Conflicts Join strategy Pick a conflict and generate all possible fixes over join positions. We ask less questions 11/20

17 Questioning strategies
4 Questioning strategies Intuition by examples Consider the following knowledge base: Conflicts Join strategy Pick a conflict and generate all possible fixes over join positions. Reduce the size of the next questions. Prune some questions. We ask less questions Improvement: propagate fixes. 11/20

18 Questioning strategies
4 Questioning strategies Intuition by examples Consider the following knowledge base: Conflicts MCD strategy Rank join positions w.r.t inclusion in conflicts and choose the top ranked. 2 conflicts by one question in the example. Observation: more overlapping less questions 12/20

19 5 Experimental study Experimental environment Variables:
Experimental study Experimental environment Variables: Effectiveness: avg number of questions per strategy and average number of conflicts per question. Delay time: average delay time between asked question. Environment: Java 8, 2.40GHz 4core, 16G RAM (windows 7). Multi trial experiments with a cold start. For each experimental variable we test our approach on synthetic and real-world datasets. 13/20

20 5 Experimental study Effectiveness KBs:
Experimental study Effectiveness KBs: Durum Wheat Kb v1: manually constructed. TGDs and CDDs have been validated by experts. Summary v1: 567 atoms, TGDs=269 , CDD=27, 185 conflicts. Summary v2: 567 atoms, TGDs=269, CDD=100, 212 conflicts. Results: 14/20

21 5 Experimental study Effectiveness: synthetic KBs no TGDs Results:
5 Experimental study Effectiveness: synthetic KBs no TGDs Results: 15/20

22 5 Experimental study Effectiveness: synthetic KBs (convergence)
5 Experimental study Effectiveness: synthetic KBs (convergence) Results: 16/20

23 Experimental study Effectiveness: synthetic KBs (convergence) Results:

24 5 Experimental study Delay time: synthetic KBs only CDDs
Experimental study Delay time: synthetic KBs only CDDs Reasonable delay time: less than 1 to 2 seconds1. MCD strategy is used. Drum wheat v1&2 less than 1 sec. Results: 17/20 1 Robert B Miller, Response time in man-computer conversational transactions,

25 5 Experimental study Delay time: synthetic KBs CDDs and TGDs
5 Experimental study Delay time: synthetic KBs CDDs and TGDs Reasonable delay time: less than 1 to 2 seconds1. MCD strategy is used. Drum wheat v1&2 less than 1 sec. Results: TGDs CDDs D1 50 150 D2 100 D3 D4 200 Size 400 atoms Inc ratio 100% 18/20 1 Robert B Miller, Response time in man-computer conversational transactions,

26 6 Conclusion Update-based repairing of inconsistent knowledge bases.
Conclusion Summary: Update-based repairing of inconsistent knowledge bases. Interactive repairing in presence of interacting dependencies. Strategies for interaction minimization. Approach can be applied on portions of large knowledge bases. Delay time is reasonable. Perspectives: Full Denial Constraints (but undecidability!). Other Data Cleaning constraints (CFDs, Metric FDs etc.). 19/20

27 End! Thank you! Questions 20/20


Download ppt "Interactive repairing of inconsistent knowledge bases"

Similar presentations


Ads by Google