Download presentation
Presentation is loading. Please wait.
Published byJared Flynn Modified over 9 years ago
1
Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete Information Systems Laboratory, FORTH-ICS
2
Problem Statement Data Integration System DB Query Sub-queries Mappings 2 of 25
3
Outline 1.Past Approaches 2.Our Idea 3.Modelling Ontology Evolution 4.Rewritings among ontology versions 5.Problems & Solutions 6.Rewritings to the sources 7.Implementation/Evaluation 8.Conclusions 3 of 25
4
1. Past Approaches (1/2) Mapping Adaptation (Velegrakis, 2004) Idea: After each small evolution the mapping can be incrementally adapted by applying local modifications. System-dependent The list of changes may not be given and should be discovered (how?) Multiple list of changes may lead to the same effect Cannot handle complex change operations such as split & merge The algorithm should reapply after each primitive change Inefficient when we have a long list of changes S O O1O1 O2O2 move elem add elem delete constraint M1M1 M2M2 M3M3 Lack of a precise criterion under which the adapted mappings constitute indeed the “right result” 4 of 25
5
1. Past Approaches (2/2) Mapping Composition (Bernstein, 2008) Idea: Is it possible to generate M’ that is equivalent to the original mappings? No known implementation on ontology evolution First-order mappings: not closed under composition Second-order: Too difficult to handle Not supported by DBMS ( not likely in the future either) Not understood by domain experts M S O O’ M ’ = M ° E E Can use schema mapping tools to construct E. The composition for all mappings should be produced. Several Sets of mapping between each T and T ’ 5 of 25
6
“Everything should be as simple as it is, but not simpler”- Albert Einstein Data Integration System DB Mappings DB Ontology as global schemaRDF/S Ontology SpaRQL System Independent More Intuitive Only one mapping set Modular Mappings created only once Verifiable Mappings 6 of 25
7
“Everything that exists, it is only change” - Heraclitus 535 BCE Definition (Change Operation). A change u from one ontology version O 1 to another version O 2 is defined as a tuple (δ a, δ d ) where: δ a corresponds to the triples that are added to O 1 in order to get O 2 δ d corresponds to the triples that are deleted from O 1 in order to get O 2 δ a (u) δ d (u)≠ø, δ a (u) δ d (u)= ø, δ a (u 1 ) δ a (u 2 )= ø δ d (u 1 ) δ d (u 2 )= ø Definition (Application semantics of a high-level change). The application of u upon O denoted by u(O) is defined as u(O) = (O δ a (u)) \ δ d (u). 7 of 25
8
3.1 Example u 1 = (Delete, ø, {has_gender(Person, Gender)} ) u 2 = (Move, {has_cont_point(Person, Cont_Point)}, {has_cont_point(Actor, Cont_Point)}) u 3 = (Merge, {domain(Cont_Point, address)}, {domain(Cont_Point, street), domain(Cont_Point, city)}) u 4 = (Rename, {domain(Person, fullname)}, {domain(Person, name)}) Person Literal Actor Gender Literal Cont. Point Literal name ssn has_gender street city address Literal has_cont_point fullname Intuitive Concise Can Describe complex evolution 8 of 25
9
9 of 37 4. Data Integration Redefined RDF/S Ontology Sources Mappings Definition (Data Integration): A data integration system I is a quadruple (O, E, S, M) where O is a version of the Ontology, E is the evolution log of the Ontology (between the ontologies under consideration), S is the set of the local sources, M is the mapping between S and one version O i
10
4.1 Affecting change operations Definition (Affecting change operation). A change operation u affects the query Q (with graph pattern G), i.e u ◊ Q if: δ d (u)≠ø and triple pattern t G that can be unified with a triple of δ d (u). Definition (Valid Rewriting): Let q a query expressed in O 1, us a sequence of change operations such that us(O 1 )= O 2. q' is a valid rewriting of q over O 2 using us if u i δ d (u)such that u i ◊ q holds that |δ a (u i )|>0, t δ d (u i ), t ◊ q and is constructed as follows: q':= (q – δ d (u i )) δ a (u i ). 10 of 25
11
Definition (equivalent query rewriting): (Lenzerini, 2002) Let O 1, O 2 two ontology versions, E a set of dependencies on the O 1 O 2 q 2 a O 2 -query An equivalent rewriting of q 2 in presence of E is a query O 1 -query, q 1 such that q 1 gives the same answers as q 2 on any O 1 instance that satisfies E Theorem: Valid rewritings are equivalent query rewritings and can be computed with O(N*T) time complexity (N= #us, T =#triples in G) 4.2 Query answering semantics 11 of 25
12
4.3 Results Proposition (Uniqueness): Valid rewritings are unique Proposition (Inverse Query Rewriting): if q 2 is a query over O 2 and E the evolution log from O 1 to O 2, we can produce an equivalent rewriting of q 2 to the O 1 by computing the valid rewriting of q 2 on the sequence of the inverted changes of E. 12 of 25
13
13 of 37 4.3. Example Actor Literal Cont. Point Literal ssn address has_cont_point fullname Person ?NAME Actor ?SSN Cont. Point ssn ?Address address fullname Person ?NAME Actor ?SSN Cont. Point ssn ?Address address fullname name Person name Literal street city ?STREET ?CITY street city Gender has_gender Initial Query Rewriten Query
14
5. Problems & Solutions Actor Literal Cont. Point Literal ssn address has_cont_point fullname Actor ?NAME ?SSN Cont. Point ssn ?Address address fullname Person Literal has_cont_point Problem Identification: One class is deleted but there exists a parent class, maintaining all properties Problem resolution: Use that class to find more general answers Person 14 of 25
15
6.1. System Architecture DlvHex Prototype (Polleres, 2007) 15 of 25
16
6.2 Source Rewriter Traditionally the problem was to find the maximally contained rewriting for one user query Algorithms: MiniCon (Pottinger, 2001), Bucket, Inverse rules Now we have several queries, one for each ontology version. Information might need to be combined among ontology versions 16 of 25
17
6.3 Source Rewriter Reuse the best algorithm for finding maximally contained rewritings But adopt it for multiple queries Properties of the algorithm Sound & Complete Complexity O(q(n m M) n ) q the number or valid rewriting, n the number of subgoals in the biggest query, m the maximal number of subgoals in a view M the number of the mappings Algorithm 3.3: EDI-Minicon(Q, M) Input: Q a set of datalog queries, M the mappings Output: The set of maximally-contained rewritings MQ 1. Initialize MCD={}, MQ={} 2. For each q j in Q 5. MCD j := FormMCDs(q j, M) 6. Add MCD j to MCD 7. For each q j in Q 8. mq j := CombineMCDs (MCD, q j ) 9. Add mq j to MQ 10. Return MQ 17 of 25
18
CIDOC-CRM 80 classes 250 properties 726 changes (01.02.02-01.06.05) Queries 50 real user queries from3D-COFORM 18 of 37 7.Preliminaty Evaluation Adding & restructuring information does not affect valid rewritings Deleting Information however it does In general assuming queries over v.4.2 from CIDOC we would be able to rewrite 89% of them
19
19 of 25 7.4 Problems: Fiction or Reality? In general assuming queries over v.4.2 from CIDOC we would be able to rewrite 89% of them to v.3.2.1 A B D B C A Del D, Add C Time Add D, Del C It makes no sense searching for C in previous versions Actually, we can provide access to the 99% of the source information through valid rewritings
20
7.2 Avg Running Time: 0,06 msec 20 of 25
21
8.1 Advantages of our approach We don’t rewrite all the mappings but the query Exploit the locality of the query Mappings are produced one time and can be validated by domain experts Greatly reduces human effort & time spent Our approach works independently of the family of mappings to the sources (GAV, LAV, GLAV, nested e.t.c) The mappings to the sources are not affected at all in order to maintain their initial semantics Modularity & scalability : New mappings or ontology changes can be defined independently We use high level changes to model ontology evolution High level changes can model complex ontology evolution Reduces the size of the evolution log Can be provided efficiently for two ontology versions. 21 of 25
22
22 of 25 8.2 Advantages of our approach Valid Rewritings We define the answer semantics in such a setting Precise criteria exists for deciding when is possible to compute valid rewritings. With small complexity Even when no valid rewritings exist Smart things are done as more-general answers We can guide user in mapping redefinition Computing Source Rewritings The increased computational complexity is linear to the number of the input queries and remains scalable.
23
8.3 Conclusions Ontology evolution is reality and data integration systems should be aware of this We have shown how to answer queries over multiple ontology versions To the best of our knowledge no system today is capable of query answering over multiple ontology versions Future Work More extensive evaluation using Gene Ontology Semantic Infrastructure for plugIT Integrate our system to Protégé MASTRO system Extend our approach to OWL variants Consider RDF Sources and their Evolution as well 23 of 25
24
1.Philip A. Bernstein, Todd J. Green, Sergey Melnik, Alan Nash: Implementing mapping composition. VLDB J. (VLDB) 17(2):333-353 (2008) 2.Vicky Papavassiliou, Giorgos Flouris, Irini Fundulaki, Dimitris Kotzinos, Vassilis Christophides: On Detecting High-Level Changes in RDF/S KBs. International Semantic Web Conference 2009:473-488 3.Maurizio Lenzerini: Data Integration: A Theoretical Perspective. PODS 2002:233-246 4.Rachel Pottinger, Alon Y. Halevy: MiniCon: A scalable algorithm for answering queries using views. VLDB J. (VLDB) 10(2-3):182-198 (2001) 5.Axel Polleres: From SPARQL to rules (and back). WWW 2007:787-796 6.Yannis Tzitzikas, Dimitris Kotzinos: (Semantic web) evolution through change logs: Problems and solutions. Artificial Intelligence and Applications 2007:654-659 7.Yannis Velegrakis, Renée J. Miller, Lucian Popa, John Mylopoulos: ToMAS: A System for Adapting Mappings while Schemas Evolve. ICDE 2004:862 References
25
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.