Download presentation
Presentation is loading. Please wait.
Published byIrea Crowley Modified over 11 years ago
1
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br A Method for Defining Semantic Similarities between GML Schemas Angelo Augusto Frozza – UFSC / UNIPLAC Ronaldo dos Santos Mello - UFSC GBD UFSC Data Base Group of Santa CatarinaFederalUniversity Data Base Group of Santa Catarina Federal University
2
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Summary 1.Introduction 2.Method overview 3.Preprocessing 4.Definition of the similarity score 5.Mapping catalog 6.Conclusion
3
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Summary 1.Introduction 2.Method overview 3.Preprocessing 4.Definition of the similarity score 5.Mapping catalog 6.Conclusion
4
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Motivation GIS have been extensively used by several kinds of organizations Organizations may need to interchange geographic data –Problem: data heterogeneity a same geographic entity may have different representations in different organizations –Solutions for supporting geographic data interoperability among autonomous and heterogeneous sources are required
5
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Motivation Information interchange among GIS must solve heterogeneities at the following levels: –syntactic –semantic Syntactic level -> schema heterogeneity –requires conversion of export and import formats –does not ensure that the data have any meaning to new users Semantic level – two geographic entities represent the same real world fact?
6
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Tendency Current solutions for syntactic and semantic interoperability among GIS are based on the use of standards and ontologies Main initiatives –Geography Markup Language (GML) –Ontology Web Language (OWL)
7
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Proposal A method for semi-automated determination of semantic similarities between elements of distinct GML schemas –consider the aid of an ontology as a basis for common knowledge –may consider expert user intervention Contributions –Support for the development of GIS that requires semantic interoperability –Solution applied to recent technologies for representing geographic data and ontologies GML and OWL –The method is applied to urban registration domain Not so much explored on related work Domain with large potential for practical applications –The method focus on the integration of small non-interconnected data sources
8
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Summary 1.Introduction 2.Method overview 3.Preprocessing 4.Definition of the similarity score 5.Mapping catalog 6.Conclusion
9
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br The Proposed Method Input Processing (on GML schema home) Output OWLGML Mapping definition Domain ontology wrapper... GML schema wrapper... Similarity definition (a) (b)
10
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br The Proposed Method Processing (on GML schema home) OWLGML Mapping definition Domain ontology wrapper... GML schema wrapper... Similarity definition (a) (b)
11
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Summary 1.Introduction 2.Method overview 3.Preprocessing 4.Definition of the similarity score 5.Mapping catalog 6.Conclusion
12
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Data PreProcessing A wrapper is used to convert ontology and GML schemas into a canonic (tree) structure O1 = Parcel O2 = address (string) O3 = BlockNumber (integer) O4 = isPart (Block, atomic) O5 = hasRepresentation (geographicRepresentation, multivalued) G1 = ParcelArea G2 = address (string) G3 = Block (integer) G4 = isPart (BlockMTR, atomic) OWLGML O4O5G4 Relationship O2O3G2G3 Attribute O1G1 Complex element
13
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Summary 1.Introduction 2.Method overview 3.Preprocessing 4.Definition of the similarity score 5.Mapping catalog 6.Conclusion
14
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Similarity Score Definition Types of conflicts considered: –Nomenclature Synonyms Homonyms –Composition Structure (properties) –Relationships Generalization/Specialization Association
15
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Similarity Score Definition We adapt the metrics proposed by Dorneles et al. (2004): –Metrics for Complex Values (MCV) applied to data structures (complex element) –Metrics for Atomic Value (MAV) applied to simple data (strings, dates, …) application domain dependent This metric set refers to a taxonomy appropriate to XML data handling.
16
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Similarity Score Definition Each GML schema tree node is tested against each ontology tree node 1.A node name is initially tested for equality against a table of synonyms:
17
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Similarity Score Definition 2.If one or more corresponding synonyms are found, a structure similarity metric is applied on each positive result OWLGML O4O5G4O2O3G2G3 O1G1 OWLGML O4O5G4O4O5G4O2O3G2G3O2O3G2G3 O1G1O1G1 Parcel
18
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Similarity Score Definition 3.If no corresponding synonym is found, a new search is done on the synonym table, applying a name similarity metric Example: BlockMTR = Block Chosen metric: Jaro Winkler It extends the Jaro metric It prevents strings that differ only at the end from having a large distance between them It considers the concept of prefix
19
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Similarity Score Definition 4.If the similarity score is acceptable, the structure similarity metric is applied on each result The pair with higher similarity score is chosen
20
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Structure Similarity Metric –ε p : a node on set p –ε d : a node on set d –p : set of element nodes from GML schema tree –d : set of class nodes from the ontology tree –n e m : number of children from ε p and ε d, respectively
21
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Similarity Score Definition εpεp εdεd
22
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Simple Attribute Metric This metric is composed by Jaro Winkler metric for names Data type compatibility analysis nameSim – attribute name similarity typeSim – data type similarity names and data types have different weights
23
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Jaro Winkler Metric JaroWinklerScore(s,t) = JaroScore(s,t) + (prefixLength * PREFIXSCALE * (1 - JaroScore(s,t))) –prefixLength - the length of the common prefix at the start of the string –PREFIXSCALE - a constant scaling factor for how much the score is adjusted upwards for having common prefix's Examples: Block BlockMTR 0,875 + (0,5 * 0,125) = 0,937 ParcelCTM ParcelTaxable 0,820 + (0,6 * 0,179) = 0,927
24
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Relationship Metric This metric is composed by Jaro Winkler metric for names Concept similarity Cardinality constraint analysis nameSim – relationship name similarity concSim – concept similarity cardSim – cardinality similarity The components of the formula have different weights
25
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br –sim2 = attrSim (G2, O2) = 1 [address address] –sim3 = attrSim (G3, O3) = 0,95 [BlockNumber Block] Example of Similarity Definition OWLGML O4O5G4 Relationship O2O3G2G3 Attribute O1G1 Complex element
26
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br –sim4 = relSim (G4, O4) = 0,98 [isPart isPart] Example of Similarity Definition OWLGML O4O5G4 Relationship O2O3G2G3 Attribute O1G1 Complex element
27
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Example of Similarity Definition –tupleSim() = (sim2 + sim3 + sim4) / 4 –tupleSim() = (1 + 0,95 + 0,98) / 4 = 0,73
28
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Summary 1.Introduction 2.Method overview 3.Preprocessing 4.Definition of the similarity score 5.Mapping catalog 6.Conclusion
29
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Mapping Catalog The catalog is composed by two table sets 1.Information about the imported GML schemas (metadata) 2.Schema mappings i.Each element on the main GML schema may have an equivalent concept in the ontology ii.Elements and similarities on the GML schemas are related to the concepts from the main GML and the ontology
30
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Mapping Catalog - Example
31
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Summary 1.Introduction 2.Method overview 3.Preprocessing 4.Definition of the similarity score 5.Mapping catalog 6.Conclusion
32
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Conclusion The assumptions that bases our work is –Geographic data interchange happens mainly among domains with some affinity –Geographic data are better defined semantically on a specific domain than through domain generalization In this context, we expect that our method is useful as part of a system for GIS data integration
33
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Main Contribution This work proposes a solution for the problem of semantic interoperability among GML schemas within the domain of urban registration Method characteristics –an ontology that represents the domain knowledge –semi-automated equivalence determination
34
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Related Work Related work focus on translating queries executed on closely interconnected heterogeneous environments This work focus on data integration on environments that are not necessarily interconnected This research includes a scenario where: –small municipalities, individually, have no means to keep complex systems –geographic data are spread over many institutions On the other hand, as a consortium, they could promote data interchange through a mechanism that would identify the similarity among them
35
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Future Work To define and execute experiments to validate and improve the method To increase the scope of the domain To extend the method to be applied to other domains –To consider other ontologies To provide the integration of GML instances To specify an environment for distributed geographic data queries based on the mappings
36
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br A Method for Defining Semantic Similarities between GML Schemas Thanks! GBD UFSC Data Base Group of Santa Catarina Federal University Angelo Augusto Frozza Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br
37
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Application: Urban Register Ontology
38
Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, ronaldo}@inf.ufsc.br Application: Urban Register GML schema
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.