Automatic Schema Matching Nicole Oldham CSCI 8350 (Semantic Web Univ of Georgia) Topic Presentation.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Database Design: ER Modelling (Continued)
XML: Extensible Markup Language
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
1 A Survey of Approaches to Automatic Schema Matching Name: Samer Samarah Number: This.
1 UIM with DAML-S Service Description Team Members: Jean-Yves Ouellet Kevin Lam Yun Xu.
Amit Shvarchenberg and Rafi Sayag. Based on a paper by: Robin Dhamankar, Yoonkyong Lee, AnHai Doan Department of Computer Science University of Illinois,
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/1 Outline Introduction Background Distributed Database Design Database Integration ➡ Schema Matching ➡
1 A Survey of Approaches to Automatic Schema Matching Erhard Rahm Philip A. Bernstein The VLDB Journal 10: (2001)
Information Retrieval in Practice
Xyleme A Dynamic Warehouse for XML Data of the Web.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
A Review of Ontology Mapping, Merging, and Integration Presenter: Yihong Ding.
Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Schema Matching Algorithms Phil Bernstein CSE 590sw February 2003.
Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
Automatic Data Ramon Lawrence University of Manitoba
Methodology Conceptual Database Design
Overview of Search Engines
ONTOLOGY MATCHING Part III: Systems and evaluation.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Ontology Matching Basics Ontology Matching by Jerome Euzenat and Pavel Shvaiko Parts I and II 11/6/2012Ontology Matching Basics - PL, CS 6521.
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
A survey of approaches to automatic schema matching Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
Learning Source Mappings Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems October 27, 2008 LSD Slides courtesy AnHai.
A SURVEY OF APPROACHES TO AUTOMATIC SCHEMA MATCHING Sushant Vemparala Gaurang Telang.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
10/18/20151 Business Process Management and Semantic Technologies B. Ramamurthy.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
Dimitrios Skoutas Alkis Simitsis
CSE 636 Data Integration Schema Matching Cupid Fall 2006.
HKU CSIS DB Seminar: HKU CSIS DB Seminar: Finding Set-Mappings in Schema Matching Supervisor: Dr. David Cheung Speaker: Eric Lo.
XML Schema Integration Ray Dos Santos July 19, 2009.
A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan.
A Survey of Approaches to Automatic Schema Matching (VLDB Journal, 2001) November 7, 2008 IDB SNU Presented by Kangpyo Lee.
Ontology Mapping in Pervasive Computing Environment C.Y. Kong, C.L. Wang, F.C.M. Lau The University of Hong Kong.
Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Semantic Mappings for Data Mediation
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Of 24 lecture 11: ontology – mediation, merging & aligning.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Information Retrieval in Practice
Web Ontology Language for Service (OWL-S)
Semantic Markup for Semantic Web Tools:
Business Process Management and Semantic Technologies
Presentation transcript:

Automatic Schema Matching Nicole Oldham CSCI 8350 (Semantic Web Univ of Georgia) Topic Presentation

Outline Introduction Application Domains Classification of Schema Matching Approaches Current Work MWSAF Matching Open Research Directories Conclusion

Schema Matching Match: Takes two schemas as input and produces a mapping between the elements that correspond to each other semantically. It is usually performed manually. -Tedious -Time Consuming -Error Prone -Expensive We must automate this process!

Example GTE telecommunications needed to integrate 40 databases with a total of 27,000 elements. Project planners estimated that manual matching would take 12 person years to integrate. Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

Various Levels of Heterogenity ftp://ftp.dagstuhl.de/pub/Proceedings/04/04391/04391.ChristophidesVassilis.Slides.pdf

How to deal with Semantic Heterogenity 1. Standardize: agree on a common representation 2. Translate: create mappings between different schemas -requires human input and machine reasoning -mappings can be difficult and expensive 3. Annotate: create relationships between agreed upon conceptualizations -requires human input and machine reasoning -annotation can be difficult and expensive ftp://ftp.dagstuhl.de/pub/Proceedings/04/04391/04391.ChristophidesVassilis.Slides.pdf

Challenges Actual semantics of the involved elements are typically only from the creators or documentation – so we must use clues in the schema and data instead. These clues are often misleading. Ie. ‘Area’ can refer to different entities Ie. The same entities can have very different names. Clues are often ambiguous. Ie. ‘Contact-agent’ Agent name or phone number? Matching process can be very costly Each element of the schema must be examined to ensure discovery of the best match. Matching is often subjective depending on the application. Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

Outline Introduction Application Domains Classification of Schema Matching Approaches Current Work MWSAF Matching Open Research Directories Conclusion

Where is Schema Matching used? Database Application Domains -Data Integration -Data Warehousing -E-Business -Query Processing Semantic Web -XML/HTML to an Ontology -Semantic Web Services Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Schema Integration Problem: Construct a global view from a set of independently constructed schemas. (ie: ontologies) - Different structure and terminologies Solution: Schema Matching is performed to find relationships between concepts in each schema. Then the matching elements can be unified. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Data Warehouses Problem: Integrating data sources into a data warehouse. - Different formats between the source and warehouse. Solution: Use matching to find the elements of the source that are also present in the warehouse. Then the details of the semantics can be examined to integrate the two. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

E-Commerce Problem: Message translation. -Each trading partner uses its own message format. Solution: A match operation would reduce the amount of manual work to specify how the formats are related. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Query Processing Problem: The terms used in the user’s query may be different from those in the database. Solution: Matching is used to map the user-specified concepts in the query to schema elements. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Need for Data Integration on the Semantic Web Problem: Web documents are not in RDF or any form suitable for the SW. We must annotate them with concepts from ontologies. Solution: Use schema matching to map between elements represented in OWL and the different schemas of web documents.

Semantic Web Services Problem: Web Services are currently searched for using keywords. We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently. WSDLs are in XML, Ontologies in OWL! Solution: Use schema matching approaches to map between the two different schemas.

Outline Introduction Application Domains Classification of Schema Matching Approaches Current Work MWSAF Matching Open Research Directories Conclusion

Term Definitions Schema: a set of elements connected by some structure. Mapping: a set of mapping elements, each of which indicates that certain elements of schema s1 are mapped to certain elements in s2. Mapping Expression: Tells how s1 and s2 elements are related. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Example A mapping between s1 and s2 might contain these elements: Cust.C#=Customer.CustID Concatenate(Cust.FirstName, Cust.LastName) = Customer.contact Cust.CName = Customer.Company S1 ElementsS2 Elements CustCustomer C#CustID CNameCompany FirstNameContact LastNamePhone Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Example Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

Classification of Schema Matching Approaches Instance vs Schema: matching approaches can consider instance data or schema-level information. Element vs Structure matching: match can be performed for individual schema elements or combinations of elements. Language vs Constraint: linguistic (names) or constraint-based (keys and relationships). Matching Cardinality: match result may relate one or more elements of one schema to one or more elements of another. Auxiliary Information: matcher relies on other information besides the input schemas, such as dictionaries, user input, global schemas. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Classification of Schema Matching Approaches Schema Matching Approaches Individual MatchersCombining Matchers Schema-only Structure LevelElement Level Instance/Contents ConstraintLinguisticConstraint ……… Element Level ConstraintLinguistic …… Hybrid MatchersComposite Matchers Manual CompositionAutomatic Composition Further Criteria -Match Cardinality -Auxiliary information used… Name Similarity Description Similarity Global Namespaces Word Frequency Group Matching Type Similarity Key Properties Value Pattern and Ranges Sample Approaches Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Schema Level Matchers Consider schema information instead of instance data: Name, Description, Data Type, Relationship Types, Constraints, Structure Often produces multiple candidates and estimates a degree of similarity for each 1.Granularity of match (element level vs structure level) 2.Match Cardinality 3.Linguistic Approaches: Name or Description Matching 4.Constraint-Based Approaches 5.Reusing Schema and Matching Information Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Element-Level Element-Level: Identifies all elements of S1 that are the same or similar to elements of S2. The match comparison can be based on name, description, or data type of the element. Example of name-based element-level matching: Address = CustomerAddress Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Structure-Level Structure-Level: Matches combinations of elements that appear together in S1 with combinations of elements that appear together in S2. Full Structure Match: Partial Structure Match: Equivalence Patterns: Can enhance structure matching by considering known equivalence patterns stored in a library. S1 ElementsS2 Elements AddressCustAddress Street City StateUSState ZipPostalCode S1 ElementsS2 Elements AccountOwnerCustomer NameCname AddressCAddress BirthdateCPhone TaxExempt Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Match Cardinality One or more S1 elements can match one or more S2 elements. Complex matches Examples of the four local cardinality cases for individual mapping elements. Local Match Cardinalities S1 Element(s)S2 Element(s)Matching Expression 1:1, element levelPriceAmountAmount = Price n:1, element levelPrice, TaxCostCost = Price*(1+Tax/100) 1:n, element levelNameFirstName, LastName FirstName, LastName = Name n:m, element level also n:1, structure level B.Title B.PuNo, P.PuNo, P.Name A.Book, A.Publisher A.Book, A.Publisher = Select B.Title, P.Name From B, P Where B.PuNo = P.PuNo Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Complex Matches 1:1 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema Only a few works on complex matching have been done. Some hard code complex matches into rules. Some rely on a domain specific ontology. We need domain knowledge to accurately perform complex matching. The best match isn’t always the top match returned by the matcher – so human involvement is still needed. Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

Linguistic Approaches Language based matchers use names and text (i.e. words or sentences) to find semantically similar schema elements. Name Matching: match elements with similar names Description Matching: match comments in the schemas Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Linguistic Approaches: Name Matching Matches schema elements with equal or similar names. How similarity is defined: 1. Equality of names 2. Equality of names after stemming, deals with prefixes/suffixes. 3. Equality of synonyms 4. Equality of hypernyms (suv is a type of car) 5. Similarity of names based on common substrings, soundex, pronunciation (ShipTo = Ship2) 6. User provided name matches. Can be element or structure-level. Cardinality is not limited to 1:1. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Linguistic Approaches: Description Matching Schemas can contain comments in natural language that express the intended semantics of the schema elements. Example S1: empn // employee name S2: name // name of employee Can be as simple as keyword extraction and synonym matching, or as complex as using natural language understanding technology. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Constraint Based Schemas often contain constraints to define data types and value ranges, optionality, relationship types, cardinalities, etc. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Reusing Schema and Mapping Information The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings. Many schemas are often very similar to each other and previously matched schemas. i.e. In E-Commerce, substructures often repeat within different message formats (address fields, name fields) A schema library should be created and the schema editors should access the library to use predefined terms and definitions. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Schema Mapping Reuse Example Problems: 1. Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself. 2. Similarity values may depend on the domain. i.e. Salary and income may be identical in payroll application but not in a tax reporting application Schema S1Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address POrder Article Payee BillAddress Recipient ShipAddress Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Instance Level Approaches Why? 1. Little or no schema information available. 2. Enhancement of schema-level matchers. Instance data gives insight to the contents and meaning of schema elements. 3. To match instance-level data. How? 1. Preferred Method: Linguistic Characterization 2. Constraint-based Characterization i.e. Ranges 3. Auxiliary Information 4. Also uses both rule-based and learner-based techniques. Main Problem: When comparing data at the instance-level it is likely that there will be a ton of possible match combinations, a lot of which are irrelevant. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Rule Based Solutions Rule-Based: hand crafted rules to exploit schema information element names, data types, structures and subelements. Ie: two elements match if they have the same name and the same number of subelements Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

Learner Based Solutions Learner-Based: exploit both schema and data. Requires a lot of training data but can exploit data. Rule and learner based techniques combined provide an effective matching solution. Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

Combining Different Matchers The ideal matching system must exploit many different types of information and technique for maximum accuracy. More match candidates will be produced if the previous approaches are combined. Two Combination Methods: 1. Hybrid: integrates multiple matching criteria. Better performance. 2. Composite: combine the results of independently executed matchers. More flexible. Can be done automatically or manually. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Outline Introduction Application Domains Classification of Schema Matching Approaches Current Work MWSAF Matching Open Research Directories Conclusion

LSD (Univ. of Washington) Learning Source Descriptions Uses machine learning techniques to match a new data source against a previously determined global schema. Uses a name matcher and several instance-level matchers. System is trained with sample user inputs and it learns patterns and matching rules. Mostly instance-oriented but can use schema information too. Also supports user input domain constraints on the global schema. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

SKAT (Stanford University) Semantic Knowledge Articulation Tool Follows a rule-based approach to semi-automatically determine matches between two ontologies. User input required: * The user must provide application specific match/mismatch relations. * The user must approve or reject matches. SKAT matching is used within the ONION architecture for ontology integration. In ONION, an “articulation ontology” is constructed from the rules. Matching is based on is-a relationships between the articulation ontology and the source ontology. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

TransScm (Tel Aviv University) Uses schema matching to derive an automatic data translation between schema instances. Schemas are transformed into labeled graphs. Matching is performed node by node (element-level, 1:1) starting at the top. Requires user intervention if no match is found (i.e. to provide a new rule). Bernstein P, Rahm E. A survey of approaches to automatic schema matching

DIKE (Univ. of Reggio Calabria, Univ. of Calabria) Compares pairs of objects by their attributes and the is-a relationships that they are involved in. These pairs are given a match score between 0 and 1. User must specify synonyms, homonyms, and inclusion properties. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Cupid (Microsoft Research) Hybrid matcher Element and Structural-Level matches. Phase 1: Linguistic Element-Level. - categorizes elements based on name, data types, and domains. - calculates a linguistic similarity coefficient. Phase 2: - transform the original schema into a tree then perform a bottom-up structure matching. - calculates a similarity value. - calculates a weighted mean of linguistic and structural similarity of pairs of elements Phase 3: - uses the mean from phase 2 to decide on a mapping. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Clio (IBM Almaden and Univ. of Toronto) Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema. Three Components: Schema Readers: read schema and translate it into an internal representation. Correspondence Engine: is used to identify matching parts of the schemas or databases. Mapping Generator: generates view definitions to map data in the source schema to data in the target schema. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Similarity flooding (Stanford Univ. and Univ. of Leipzig) Graph Matching Algorithm. Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs. Uses a name matcher to get an initial element- level match that is then given to the structural matcher. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Delta (Mitre) Uses attribute descriptions to determine attribute matches. The method is to group the metadata about an attribute into a text string which is presented as a document. The user is then presented with other ‘documents’ with matching attributes and can chose from those. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Tess (Univ. of Massachusetts, Amherst) System for helping to cope with schema evolution. Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema. Bernstein P, Rahm E. A survey of approaches to automatic schema matching

Outline Introduction Application Domains Classification of Schema Matching Approaches Current Work MWSAF Matching Open Research Directories Conclusion

MWSAF: Meteor-S Web Service Annotation Framework LSDIS Lab, UGA What is it? A tool for semi-automatically marking up web service descriptions with ontologies. It helps in describing services semantically and aids in efficient web service discovery and composition.

MWSAF Annotation Tool Input: WSDL File 1. Individual elements of the WSDL are matched to concepts in the domain 2. The WSDL is classified into a domain. 3. The Matches are given to the user to accept or reject. 4. Upon the user’s acceptance, the annotations are written to the WSDL. Output: WSDL File with semantic annotations

MWSAF Architecture Main Components of the System: 1. Ontology Store: stores the DAML and RDF ontologies that will be used to annotate the WSDL files. Ontologies are categorized by domain. 2. Parser Library: consists of the parsers used to generate the SchemaGraphs. 3. Matcher Library: provides schema matching algorithm. Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework

MWSAF Schema Graphs PROBLEM: The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly. MWSAF converts both models to a common representation format called SchemaGraph. A SchemaGraph is a set of nodes connected by edges that are created using conversion functions. Then it applies a matching algorithm to find the mappings between them. Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework

MWSAF: Meteor-S Web Service Annotation Framework XML to SchemaGraph conversion rules <xsd:element maxOccurs="1" minOccurs="1" name="compass" nillable="true" type="xsd1:DirectionCompass" /> <xsd:element maxOccurs="1" minOccurs="1" name="degrees" type="xsd:int" /> Direction degrees Direction Compass hasElement compass SchemaNode representation of XML schema Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework.

MWSAF: Meteor-S Web Service Annotation Framework Ontology to SchemaGraph conversion rules Superclass for all events dealing with wind Wind event Wind direction Wind speed WindEvent windDirectionSpeed hasPropertywindSpeed SchemaGraph representation of part of ontology Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework.

Mapping Measures of the Match Score: -Element Level Match: linguistic similarity of two concepts based on names. Uses WordNet to check for synonyms. Abbreviations are even checked. -Schema Match: structural similarity, sub-concept similarities. The getBestMapping function then looks at the Match Scores and determines a map set. Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework

MWSAF Matching Techniques: ElemMatch Name and String Matching algorithms: -NGram: considers the number of qgrams that the names have in common. -CheckSynonym: uses Wordnet to find synonyms. -CheckAbbreviations: uses an abbreviation dictionary. -TokenMatcher: uses Porter Stemmer tonkenization and substring matching techniques. Each algorithm returns a value between 0 and 1. These values are used in an equation for the final match score. Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework

Matching Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology, Then two measures are derived from the mapping: -Average Concept Match: tells the user about the degree of similarity between matched concepts of the WSDL and ontology. -Average Service Match: helps to categorize the service. *We have a machine learning alternative for categorization! Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework

Outline Introduction Application Domains Classification of Schema Matching Approaches Current Work MWSAF Matching Open Research Directories Conclusion

Current and Future Issues User Interaction: minimize user input but maximize impact of the feedback Real World Analysis: can the current matching techniques be used in real world situations? P2P data management Mapping Maintenance: what happens when you map between two schemas and then one changes? Developing global schemas (or ontologies) for domains. Dealing with inconsistent data values for a schema element. Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

More Issues If we require user acceptance for our matches, then what happens if our matcher returns thousands or hundreds of matches? Is it unrealistic to think that we will eventually perfect our matchers? Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

Conclusion It is necessary to automate the matching process. Schema matching is very difficult and expensive. We have looked at a taxonomy and the descriptions of the existing approaches for matching. -Schema vs Instance-level -Element vs Structure-level -Language and Constraint based matchers. We also discussed several implementations of the matching techniques.

References Bernstein P, Rahm E. A survey of approaches to automatic schema matching. Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey. Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework. POSV-WWW2004.pdf POSV-WWW2004.pdf Vassilis C, Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM). Dagsthul Seminar ftp://ftp.dagstuhl.de/pub/Proceedings/04/04391/04391.ChristophidesVassilis.Slides.p df

Questions ?