QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts,

Slides:



Advertisements
Similar presentations
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, A Method for Defining Semantic Similarities between GML Schemas Angelo Augusto.
Advertisements

1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
USC Graduate Student DayColumbia, SCMarch 2006 Presented by: Jingshan Huang Computer Science & Engineering Department University of South Carolina PhD.
Generic Schema Matching using Cupid
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Merging Models Based on Given Correspondences Rachel A. Pottinger Philip A. Bernstein.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya Fridman Noy and Mark A. Musen.
1 SWE Introduction to Software Engineering Lecture 15 – System Modeling Using UML.
Learning to Match Ontologies on the Semantic Web AnHai Doan Jayant Madhavan Robin Dhamankar Pedro Domingos Alon Halevy.
Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration Li Xu David W. Embley David Jackman.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Ontology Matching Basics Ontology Matching by Jerome Euzenat and Pavel Shvaiko Parts I and II 11/6/2012Ontology Matching Basics - PL, CS 6521.
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
A Unified Framework for the Semantic Integration of XML Databases
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
BACKGROUND KNOWLEDGE IN ONTOLOGY MATCHING Pavel Shvaiko joint work with Fausto Giunchiglia and Mikalai Yatskevich INFINT 2007 Bertinoro Workshop on Information.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
Assessing Quality for Integration Based Data M. Denk, W. Grossmann Institute for Scientific Computing.
Querying Structured Text in an XML Database By Xuemei Luo.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
IMPROVING E-COMMERCE COLLABORATIVE RECOMMENDATIONS BY SEMANTIC INFERENCE OF NEIGHBORS’ PRACTICAL EXPERTISE 6 th International Workshop on Semantic Media.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Dimitrios Skoutas Alkis Simitsis
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
A Collaborative and Semantic Data Management Framework for Ubiquitous Computing Environment International Conference of Embedded and Ubiquitous Computing.
CSE 636 Data Integration Schema Matching Cupid Fall 2006.
HKU CSIS DB Seminar: HKU CSIS DB Seminar: Finding Set-Mappings in Schema Matching Supervisor: Dr. David Cheung Speaker: Eric Lo.
A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Object-Oriented Modeling: Static Models. Object-Oriented Modeling Model the system as interacting objects Model the system as interacting objects Match.
Background Information systems requiring sensor data input must generally include means for sensor data fusion as well as powerful mechanisms for user.
Ontology Mapping in Pervasive Computing Environment C.Y. Kong, C.L. Wang, F.C.M. Lau The University of Hong Kong.
A Hybrid Match Algorithm for XML Schemas Ray Dos Santos Aug 21, 2009 K. Claypool, V. Hegde, N. Tansalarak UMass – Lowell - ICDE ‘06.
Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.
Learning Taxonomic Relations from Heterogeneous Evidence Philipp Cimiano Aleksander Pivk Lars Schmidt-Thieme Steffen Staab (ECAI 2004)
Fundamentals, Design, and Implementation, 9/e Appendix B The Semantic Object Model.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Ontology Evaluation Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
Cross-Ontological Relationships
Entity-Relationship Model
Challenges in Creating an Automated Protein Structure Metaserver
Methontology: From Ontological art to Ontological Engineering
Adaptive entity resolution with human computation
[jws13] Evaluation of instance matching tools: The experience of OAEI
Introduction to Information Retrieval
Integrating Taxonomies
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Presentation transcript:

QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts, Lowell Oct 14 th, th International Conference on Conceptual Modeling (ER) 2003 Chicago, Illinois

ER Integration of information - A big challenge! “ Data data everywhere and …… Problem: Heterogeneous data sources Concepts: protein sequence, grams of protein Semantics: “ protein ” for a protein scientist vs “ protein ” for a nutritionist Data Formats: XML, object oriented, relational Access Methods: special purpose programs (BLAST), SQL, XQuery “ need a way to integrate ” Introduction

ER Integration of heterogeneous sources Source 1Source 2Source n Integrated Sources Problems: -Resolve conflicts -Integrate data -Interpret results Goal: -Automated / Semi- automated Integration via “ Schema Matching ”

ER Schema Matching - The Process Schema matching: process of finding “ semantic correspondences ” between the entities of two or more schemas –Input: two schemas –Output: set of matches between the two schemas Two entities match if their similarity value is above threshold Similarity values & thresholds tightly coupled to algorithm. –Example: CUPID[MBR01] defines similarity value as the fraction of leaves in the two subtrees that have at least one “ strong ” link to a leaf in the other subtree. Linguistic algorithm ’ s similarity values are based on the level of matching in a hypernym tree. –Thresholds are ad-hoc Problem: A match from one algorithm may not be considered a match by another algorithm!

ER Contributions of Our Work Proposal of QoM - Quality of Match metric –A metric for comparing different matches produced by the match algorithms Measurement of QoM –Qualitative measure: Match Taxonomy –Quantitative measure: Weight-based Match Model

ER Outline Motivation Our Approach –Unifying Data Model: UML –Match Taxonomy –Weight-based Match Model Related Work Conclusions and Future Work

ER Unifying Data Model VS.   UML Model

ER Definition a schema S = a class c = an attribute a = a method m =

ER Definition (Cont ’ ) a schema S = a class c = an attribute a = a method m =

ER Definition (Cont ’ ) a schema S = a class c = an attribute a = a method m =

ER Definition (Cont ’ ) a schema S = a class c = an attribute a = a method m =

ER Qualitative Measure: Taxonomy of Schema Matches Attribute/Method Level … Micro Match Goal: Describe the “quality” and the “coverage” of match Class Level … Sub-Macro Match Schema Level … Macro Match

ER Micro Match Attributes can be compared based on: label ( L ), scope ( A ), type ( T ), atomicity ( N ), intializer( I ) Match can be: Exact - Labels: exact string match or synonyms ( name vs name ) Other properties: equivalent values ( String vs char[] ) Relaxed - Labels: “almost same”, same hypernym tree ( firstName vs name ) Other properties: implied values ( protected vs private )

ER Example name =  Exact Match

ER Example name = qty = Relaxed Match  

ER Sub-Macro Match Total MatchPartial Match Classes can be compared based on: –The quality of match of its attributes (micro match) Exact vs Relaxed –The “ coverage ” : the number of micro matches between the source and the target classes Total : all attributes of the source have a match in the target Partial : some, not all, attributes of source have a match in the target.

ER Sub-Macro Match (Cont ’ ) Total Exact MatchPartial Exact Match Total Relaxed MatchPartial Relaxed Match

ER Example name, desc Recipe Total Exact Match Dish 

ER Example id, qty, item qty, item Ingredient Partial Exact Match Item Item Total Exact Match Ingredient 

ER Macro Match Schemas can be compared in a similar manner to classes Schemas can be compared based on: –The quality of match of their classes (sub-macro match) Total Exact, Total Relaxed, Partial Exact and Partial Relaxed –The “ coverage ” : the number of sub-macro matches between the source and the target schemas Total : all classes of the source have a match in the target Partial : some, not all, classes of source have a match in the target.

ER Macro Match Total Exact MatchPartial Exact Match Total Relaxed Match Partial Relaxed Match

ER Example Recipe, Ingredient, Instruction Dish, Item, Step Recipe Partial Exact Match Dish Dish Total Exact Match Recipe TEPE

ER Quantitative Measure: Weight-Based Measure of QoM Match Taxonomy : –Qualitative measure of match between two entities –Can distinguish between a total exact and a partial exact match, or a total exact and a partial relaxed match –Cannot decide if one partial exact match is better than the other, or if a total relaxed match is better than a partial exact match Weight Based Measure: –Provides a quantitative metric for the QoM  

ER =,   Match operator Weight Weight-based Match Model Match Value: “ weight ” of each match operator representing the match between two properties Example: Label match Name  W(l s, l t ) = 1.0

ER What has been done - Related Work Domain specific [BHP94,BCVB01,BM01] and domain independent [HMN+99,MBR01,DR02] algorithms Approaches exploit various types of information –Element names, structural properties, ontologies, characteristics of data instances. Example: –Doan et al. [DDH01] Combines match predictions using a set of machine learning techniques Match predictions based on element name matching, content matching, text classification and domain knowledge –Madhavan et al. (Cupid) [ MBR01] Hybrid algorithm - combines linguistic and structural match algorithm

ER Conclusions and Future Work Contributions: –Proposed QoM: a quality metric for schema matches –Two techniques to evaluate the QoM Qualitative: Match Taxonomy Quantitative: Weight-based match Model Future Work: –Combining “ user input ” for desired matches to optimize the schema match process –Refinement of QoM for XML model Accounting for order, and the different levels of nesting –Development of Match algorithms based on QoM

ER More Information: http: // //

ER Micro Match Model QoM (a s, a t ) = W (L s, L t ) + W (A s, A t ) + W (T s, T t ) + W (N s, N t ) + W (I s, I t ) 5 QoM (m s, m t ) = QoM sig (m s, m t ) + (2 * QoM spec (m s, m t ) ) 3 QoM sig (m s, m t ) = W (A s, A t ) + W (O s, O t ) + W (I s, I t ) 3 QoM spec (m s, m t ) = W (pre s, pre t ) + W (post s, post t ) 2

ER Example The Recipe SchemaThe Dish Schema

ER Micro Match (Attribute) a s = vs a t = L s vs L t A s vs A t T s vs T t N s vs N t I s vs I t Exact Match Relaxed Match   

ER Micro Match (Method) m s = vs m t = A s vs A t O s vs O t  I s vs I t Pre s vs Pre t Post s vs Post  Exact Match Relaxed Match ======   

ER Weighing the Micro Match Match between attributes based on the match of the individual properties –Exact or relaxed QoM(a s, a t ): –Quantitative measure of the match between attributes a s and a t. –The normalized sum of the match values of the individual properties of an attribute.

ER Example name =  QoM (name recipe, name dish ) = = 1.0 

ER Weighing the Sub-Macro Match Sub-Macro match: –Normalized sum of QoM of micro matches: –Coverage: –Sub-Macro Match: R W (C s, C t ) =  QoM (M s, M t ) | C s | R S (C s, C t ) = | C m s | | C t | R T (C s, C t ) = | C m s | 3 QoM (C s, C t ) = R W (C s, C t ) + R s (C s, C t ) + R T (C s, C t )

ER Example id, step direction R QoM (step Instruction, direction step ) =  = 0.9 QoM (Instruction, step) = ( ) / 3 = 0.65 R W (Instruction, step) = 0.9 / 2 = 0.45 R S (Instruction, step) = 1 / 2 = 0.5 R t (Instruction, step) = 1 / 1 = 1.0

ER Weighing the Macro Match Macro match: –Normalized sum of sub-macro QoMs –Coverage –Macro Match: R W (S s, S t ) =  QoM (C s, C t ) | S s | | S t | R T (S s, S t ) = | S m s | | S s | R S (S s, S t ) = | S m s | 3 QoM (S s, S t ) = R W (S s, S t ) + R s (S s, S t ) + R T (S s, S t )

ER Example Recipe, Ingredient, Instruction Dish, Item, Direction 1.00 QoM (RECIPE, DISH) = ( ) / 3 = 0.94 R W (RECIPE, DISH) = ( ) / 3 = 0.81 R S (RECIPE, DISH) = 3 / 3 = 1.0 R t (RECIPE, DISH) = 3 / 3 =