Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, A Method for Defining Semantic Similarities between GML Schemas Angelo Augusto.

Slides:



Advertisements
Similar presentations
Geographic Digital Content Components André Santanchè Advisor: Dr. Claudia Bauzer Medeiros Database Group Unicamp - Brazil.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Heterogeneous Geographic Objects Interoperability Victor Azevedo Master Student in Geomatics/UERJ Geoinfo, 2006 Margareth Meirelles.
AVATAR: Advanced Telematic Search of Audivisual Contents by Semantic Reasoning Yolanda Blanco Fernández Department of Telematic Engineering University.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
1 University of Namur, Belgium PReCISE Research Center Using context to improve data semantic mediation in web services composition Michaël Mrissa (spokesman)
ISWC Doctoral Symposium Monday, 7 November 2005
Forest Markup / Metadata Language FML
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
UNCERTML - DESCRIBING AND COMMUNICATING UNCERTAINTY Matthew Williams
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
An Extensible System for Merging Two Models Rachel Pottinger University of Washington Supervisors: Phil Bernstein and Alon Halevy.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Dynamic Ontologies on the Web Jeff Heflin, James Hendler.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
P2P Information Interoperability & Decision Support Domain Application SEMANTIC INTEROP QUERY PROCESSING GIS INTEROP P2P ● Heterogeneous semantic ● Semantic.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts,
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
National Survey and Cadastre – Denmark Conceptual Modeling of Geographic Databases - Emphasis on Relationships among Geographic Databases Anders Friis-Christensen.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
Interoperability in Information Schemas Ruben Mendes Orientador: Prof. José Borbinha MEIC-Tagus Instituto Superior Técnico.
RDF and OWL Developing Semantic Web Services by H. Peter Alesso and Craig F. Smith CMPT 455/826 - Week 6, Day Sept-Dec 2009 – w6d21.
“Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
A view-based approach for semantic service descriptions Carsten Jacob, Heiko Pfeffer, Stephan Steglich, Li Yan, and Ma Qifeng
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Dimitrios Skoutas Alkis Simitsis
UNCERTML - DESCRIBING AND COMMUNICATING UNCERTAINTY WITHIN THE (SEMANTIC) WEB Matthew Williams
XML Schema Integration Ray Dos Santos July 19, 2009.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Interoperability & Knowledge Sharing Advisor: Dr. Sudha Ram Dr. Jinsoo Park Kangsuk Kim (former MS Student) Yousub Hwang (Ph.D. Student)
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
Ontology Mapping in Pervasive Computing Environment C.Y. Kong, C.L. Wang, F.C.M. Lau The University of Hong Kong.
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
16/11/ Semantic Web Services Language Requirements Presenter: Emilia Cimpian
Extending the MDR for Semantic Web November 20, 2008 SC32/WG32 Interim Meeting Vilamoura, Portugal - Procedure for the Specification of Web Ontology -
ISO/IEC JTC 1/SC 32 Plenary and WGs Meetings Jeju, Korea, June 25, 2009 Jeong-Dong Kim, Doo-Kwon Baik, Dongwon Jeong {kjd4u,
1 WS-GIS: Towards a SOA-Based SDI Federation Fábio Luiz Leite Júnior Information System Laboratory University of Campina Grande
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Modeling Security-Relevant Data Semantics Xue Ying Chen Department of Computer Science.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Semantic metadata in the Catalogue Frédéric Houbie.
Mechanisms for Requirements Driven Component Selection and Design Automation 최경석.
The Semantic Web By: Maulik Parikh.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Cross-Ontological Relationships
CCNT Lab of Zhejiang University
Entity-Relationship Model
CWA3 Standardized roll-out package Part 2: XBRL Handbook for Declarers
Web Ontology Language for Service (OWL-S)
Relational Algebra Chapter 4, Part A
Associative Query Answering via Query Feature Similarity
Data Model.
[jws13] Evaluation of instance matching tools: The experience of OAEI
INSTRUCTOR: MRS T.G. ZHOU
A Semantic Peer-to-Peer Overlay for Web Services Discovery
WSExpress: A QoS-Aware Search Engine for Web Services
Toward an Ontology-Driven Architectural Framework for B2B E. Kajan, L
Presentation transcript:

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, A Method for Defining Semantic Similarities between GML Schemas Angelo Augusto Frozza – UFSC / UNIPLAC Ronaldo dos Santos Mello - UFSC GBD UFSC Data Base Group of Santa CatarinaFederalUniversity Data Base Group of Santa Catarina Federal University

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Summary 1.Introduction 2.Method overview 3.Preprocessing 4.Definition of the similarity score 5.Mapping catalog 6.Conclusion

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Summary 1.Introduction 2.Method overview 3.Preprocessing 4.Definition of the similarity score 5.Mapping catalog 6.Conclusion

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Motivation GIS have been extensively used by several kinds of organizations Organizations may need to interchange geographic data –Problem: data heterogeneity a same geographic entity may have different representations in different organizations –Solutions for supporting geographic data interoperability among autonomous and heterogeneous sources are required

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Motivation Information interchange among GIS must solve heterogeneities at the following levels: –syntactic –semantic Syntactic level -> schema heterogeneity –requires conversion of export and import formats –does not ensure that the data have any meaning to new users Semantic level – two geographic entities represent the same real world fact?

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Tendency Current solutions for syntactic and semantic interoperability among GIS are based on the use of standards and ontologies Main initiatives –Geography Markup Language (GML) –Ontology Web Language (OWL)

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Proposal A method for semi-automated determination of semantic similarities between elements of distinct GML schemas –consider the aid of an ontology as a basis for common knowledge –may consider expert user intervention Contributions –Support for the development of GIS that requires semantic interoperability –Solution applied to recent technologies for representing geographic data and ontologies GML and OWL –The method is applied to urban registration domain Not so much explored on related work Domain with large potential for practical applications –The method focus on the integration of small non-interconnected data sources

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Summary 1.Introduction 2.Method overview 3.Preprocessing 4.Definition of the similarity score 5.Mapping catalog 6.Conclusion

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, The Proposed Method Input Processing (on GML schema home) Output OWLGML Mapping definition Domain ontology wrapper... GML schema wrapper... Similarity definition (a) (b)

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, The Proposed Method Processing (on GML schema home) OWLGML Mapping definition Domain ontology wrapper... GML schema wrapper... Similarity definition (a) (b)

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Summary 1.Introduction 2.Method overview 3.Preprocessing 4.Definition of the similarity score 5.Mapping catalog 6.Conclusion

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Data PreProcessing A wrapper is used to convert ontology and GML schemas into a canonic (tree) structure O1 = Parcel O2 = address (string) O3 = BlockNumber (integer) O4 = isPart (Block, atomic) O5 = hasRepresentation (geographicRepresentation, multivalued) G1 = ParcelArea G2 = address (string) G3 = Block (integer) G4 = isPart (BlockMTR, atomic) OWLGML O4O5G4 Relationship O2O3G2G3 Attribute O1G1 Complex element

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Summary 1.Introduction 2.Method overview 3.Preprocessing 4.Definition of the similarity score 5.Mapping catalog 6.Conclusion

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Similarity Score Definition Types of conflicts considered: –Nomenclature Synonyms Homonyms –Composition Structure (properties) –Relationships Generalization/Specialization Association

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Similarity Score Definition We adapt the metrics proposed by Dorneles et al. (2004): –Metrics for Complex Values (MCV) applied to data structures (complex element) –Metrics for Atomic Value (MAV) applied to simple data (strings, dates, …) application domain dependent This metric set refers to a taxonomy appropriate to XML data handling.

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Similarity Score Definition Each GML schema tree node is tested against each ontology tree node 1.A node name is initially tested for equality against a table of synonyms:

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Similarity Score Definition 2.If one or more corresponding synonyms are found, a structure similarity metric is applied on each positive result OWLGML O4O5G4O2O3G2G3 O1G1 OWLGML O4O5G4O4O5G4O2O3G2G3O2O3G2G3 O1G1O1G1 Parcel

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Similarity Score Definition 3.If no corresponding synonym is found, a new search is done on the synonym table, applying a name similarity metric Example: BlockMTR = Block Chosen metric: Jaro Winkler It extends the Jaro metric It prevents strings that differ only at the end from having a large distance between them It considers the concept of prefix

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Similarity Score Definition 4.If the similarity score is acceptable, the structure similarity metric is applied on each result The pair with higher similarity score is chosen

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Structure Similarity Metric –ε p : a node on set p –ε d : a node on set d –p : set of element nodes from GML schema tree –d : set of class nodes from the ontology tree –n e m : number of children from ε p and ε d, respectively

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Similarity Score Definition εpεp εdεd

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Simple Attribute Metric This metric is composed by Jaro Winkler metric for names Data type compatibility analysis nameSim – attribute name similarity typeSim – data type similarity names and data types have different weights

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Jaro Winkler Metric JaroWinklerScore(s,t) = JaroScore(s,t) + (prefixLength * PREFIXSCALE * (1 - JaroScore(s,t))) –prefixLength - the length of the common prefix at the start of the string –PREFIXSCALE - a constant scaling factor for how much the score is adjusted upwards for having common prefix's Examples: Block BlockMTR 0,875 + (0,5 * 0,125) = 0,937 ParcelCTM ParcelTaxable 0,820 + (0,6 * 0,179) = 0,927

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Relationship Metric This metric is composed by Jaro Winkler metric for names Concept similarity Cardinality constraint analysis nameSim – relationship name similarity concSim – concept similarity cardSim – cardinality similarity The components of the formula have different weights

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, –sim2 = attrSim (G2, O2) = 1 [address address] –sim3 = attrSim (G3, O3) = 0,95 [BlockNumber Block] Example of Similarity Definition OWLGML O4O5G4 Relationship O2O3G2G3 Attribute O1G1 Complex element

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, –sim4 = relSim (G4, O4) = 0,98 [isPart isPart] Example of Similarity Definition OWLGML O4O5G4 Relationship O2O3G2G3 Attribute O1G1 Complex element

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Example of Similarity Definition –tupleSim() = (sim2 + sim3 + sim4) / 4 –tupleSim() = (1 + 0,95 + 0,98) / 4 = 0,73

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Summary 1.Introduction 2.Method overview 3.Preprocessing 4.Definition of the similarity score 5.Mapping catalog 6.Conclusion

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Mapping Catalog The catalog is composed by two table sets 1.Information about the imported GML schemas (metadata) 2.Schema mappings i.Each element on the main GML schema may have an equivalent concept in the ontology ii.Elements and similarities on the GML schemas are related to the concepts from the main GML and the ontology

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Mapping Catalog - Example

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Summary 1.Introduction 2.Method overview 3.Preprocessing 4.Definition of the similarity score 5.Mapping catalog 6.Conclusion

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Conclusion The assumptions that bases our work is –Geographic data interchange happens mainly among domains with some affinity –Geographic data are better defined semantically on a specific domain than through domain generalization In this context, we expect that our method is useful as part of a system for GIS data integration

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Main Contribution This work proposes a solution for the problem of semantic interoperability among GML schemas within the domain of urban registration Method characteristics –an ontology that represents the domain knowledge –semi-automated equivalence determination

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Related Work Related work focus on translating queries executed on closely interconnected heterogeneous environments This work focus on data integration on environments that are not necessarily interconnected This research includes a scenario where: –small municipalities, individually, have no means to keep complex systems –geographic data are spread over many institutions On the other hand, as a consortium, they could promote data interchange through a mechanism that would identify the similarity among them

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Future Work To define and execute experiments to validate and improve the method To increase the scope of the domain To extend the method to be applied to other domains –To consider other ontologies To provide the integration of GML instances To specify an environment for distributed geographic data queries based on the mappings

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, A Method for Defining Semantic Similarities between GML Schemas Thanks! GBD UFSC Data Base Group of Santa Catarina Federal University Angelo Augusto Frozza Ronaldo dos Santos Mello {frozza,

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Application: Urban Register Ontology

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, Application: Urban Register GML schema