Ontology Alignment Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.

Ontology Alignment Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology

The Problem  Like the Web, the Semantic Web by design will be distributed and heterogeneous.  Ontology is used in it to support interoperability and common understanding between different parties.  Ontologies themselves may have some heterogeneities.  Ontology Alignment is needed to find semantic relationships among entities of ontologies. How should I use them? !!! ? ? ? ? ? ?? d c b a

Need for Ontology Merging  There is significant overlap in existing ontologies Yahoo! and DMOZ Open Directory Product catalogs for similar domains

Terminology  Mapping :a formal expression that states the semantic relation between two entities belonging to different ontologies. Given two ontologies O 1 and O 2, mapping one ontology onto another means that for each entity (concept C, relation R, or instance I) in ontology O 1, we try to find a corresponding entity, which has the same intended meaning, in ontology O 2. map(e 1i ) = e 2j Complex mappings are not addressed: n:m, concept-relation,…  Ontology Alignment: a set of correspondences between two or more (in case of multi-alignment) ontologies. These correspondences are expressed as mappings.  Ontology Coordination: broadest term that applies whenever knowledge from two or more ontologies must be used at the same time in a meaningful way (e.g. to achieve a single goal).  Ontology Transformation: a general term for referring to any process which leads to a new ontology o0 from an ontology o by using a transformation function t.

An Example of Alignment Car : Ontology A ( ? ) Automobile : Ontology B

Terminology cont.  Ontology Translation: an ontology transformation function t for translating an ontology o written in some language L into another ontology o’ written in a distinct language L’.  Ontology Merging: the creation of a new ontology from two (possibly overlapping) source ontologies. This concept is closely related to that of integration in the database community.  Ontology Reconciliation: a process that harmonizes the content of two (or more) ontologies, typically requiring changes on one of the two sides or even on both sides.

An Example of Ontology Merging Family Car Porsche Sport Car Automobile ThingObject Luxury Car Family Car Sport Car Vehicle CarBus BMW

An Example of Ontology Merging Object Luxury Car Family Car Sport Car Family CarSport Car Automobile Thing Vehicle CarBus Porsche BMW

An Example of Ontology Merging Sport Car Automobile Thing Family Car Porsche Object Luxury Car Family Car Sport Car Vehicle CarBus BMW

An Example of Ontology Merging Object, Thing Luxury CarFamily CarSport Car Vehicle Car, AutomobileBus PorscheBMW

Process Iterations Input Output FeaturesSimilarityAggregationInterpretation Entity Pair Selection

Features Object Vehicle CarBoat hasOwner Owner Speed hasSpeed Porsche KA-123 Marc 250 km/h

Similarity Measure  String similarity  Object Similarity  Set similarity

Similarity Rules FeatureSimilarity Measure ConceptslabelString Similarity subclassOfSet Similarity instancesSet Similarity … Relations Instances

Combination  How are the individual similarity measures combined?  Linearly  Weighted  Special Function

Interpretation  From similarities to mappings  Threshold  map(e 1j ) = e 2j ← sim(e 1j,e 2j )>t

Forms of Heterogeneity in Ontologies  Syntactic: depend on the choice of the representation OWL, RDFS, DAML, N3, DATALOG, PROLOG, …  Terminological: all forms of mismatches that are related to the process of naming the entities (e.g. individuals, classes, properties, relations) that occur in an ontology. Typical Examples:  different words are used to name the same entity (synonymy);  the same word is used to name different entities (polysemy);  words from different languages (English, French, etc.) are used to name entities;  syntactic variations of the same word (different acceptable spellings, abbreviations, use of optional prefixes or suffixes, etc.). Mismatches at the terminological level are not as deep as those occurring at the conceptual level. However, Most real cases have to do with the terminological level (e.g., with the way different people name the same entities), and therefore this level is at least as crucial as the other one.

Heterogeneity in Ontologies, cont.  Conceptual: we encounter mismatches which have to do with the content of an ontology. Metaphysical differences: which have to do with how the world is “broken into pieces”.  Coverage: cover different portions – possibly overlapping– of the world.  Granularity: One ontology provides a more (or less) detailed description of the same entities.  Perspective: an ontology may provide a viewpoint, which is different from the viewpoint adopted in another ontology.

Heterogeneity in Ontologies, cont. Metaphysical differences:

Overcoming Heterogeneity  One common approach to the problems of heterogeneity is the definition of relations across the heterogeneous representations.  These relations can be used for transforming expression of one ontology into a form compatible with that of the other.  This may happen at any level: syntactic: through semantic-preserving transducers; terminological: through functions mapping lexical information; conceptual: through general transformation of the representations (sometimes requiring a complete prover for some languages);

Structure of Mapping  Alignment: a process that starts from two representations o and o’ and produces a set of mappings between pairs of (simple or complex) entities belonging to O and O’ respectively.  Intuitively, we will assume that in general a mapping can be described as a quadruple: e and e’ are the entities between which a relation is asserted by the mapping. n is a degree of trust (confidence) in that mapping. R is the relation associated to a mapping, where R identifies the relation holding between e and e’.  simple set-theoretic relation  a fuzzy relation  a probabilistic distribution over a complete set of relations  a similarity measure

Similarity  There are many ways to assess the similarity between two entities. The most common way amounts to defining a measure of this similarity.  The characteristics which can be asked from these measures:

Overcoming Heterogeneity Using Similarity  Local Methods Terminological Methods  String Based Methods  Token Based Methods  Language Based Methods Structural Methods  Internal Structure  External Structure Extensional (based on instances) Methods  When the classes share the same instances  When they do not

Terminological Methods  Terminological methods compare strings.  Can be applied to: name, label comments concerning entities URI  Take advantage of the structure of the string (as a sequence of letter).  The main idea in using such measures is the fact that usually similar entities have similar names and descriptions in different ontologies.

Terminological M., cont. (Normalization)  There are a number of normalization procedures that help improving the results of subsequent comparison: Case normalization: consists of converting each alphabetic character in the strings in their down case counterpart; Diacritics suppression: replacing characters with diacritic signs with their most frequent replacement (replacing Montréal with Montreal); Blank normalization: Normalizing all blank characters (blank, tabulation, carriage return) into a single blank character; Link stripping: normalizing some links between words (like replacing apostrophes and blank underline into dashes; Stopword elimination : eliminates words that can be found in a list (usually like, “to”, “a"... ).

Terminological M., cont. (String Based)  Substring Similarity  Hamming Distance  N-Gram Distance  Edit Distance  Jaro Similarity  Token Based Distances Term Frequency Inverse Document Frequency (TF/IDF) Path Distance : not only the labels of objects but the sequence of labels of entities to which those bearing the label are related.

Terminological M., cont (String Methods)  In string edit distance, the operations usually considered are insertion of a character, replacement of a character by another and deletion of a character.  Levenstein Distance is an Edit Distance with all costs to 1.

Terminological M., cont. (Language Based)  Rely on using NLP techniques to find associations between instances of concepts or classes.  Intrinsic methods : perform the terminological matching with the help of morphological and syntactic analysis to perform term normalization. (Stemming) : going  go  Extrinsic methods: make use of external resources such as dictionaries and lexicons (Wordnet). Resnik Semantic Similarity

Structural Methods  The structure of entities that can be found in ontology can be compared, instead of comparing their names or identifiers.  Internal Structure: use criteria such as the range of their properties (attributes and relations), their cardinality, and the transitivity and/or symmetry of their properties to calculate the similarity between them.  External Structure: The similarity comparison between two entities from two ontologies can be based on the position of entities within their hierarchies.

Structural Methods (External)  If two entities from two ontologies are similar, their neighbors might also be somehow similar.  Criteria for deciding that the two entities are similar include: Their direct super-entities are already similar. Their sibling-entities are already similar. Their direct sub-entities are already similar. All (or most) of their descendant-entities (entities in the sub tree rooted at the entity in question) are already similar. All (or most) of their leaf-entities are already similar. All (or most) of entities in the paths from the root to the entities in question are already similar.

Structural Methods (External), cont.  Existing Approaches: Structural topological dissimilarity on hierarchies Upward Cotopic Distance

Extensional (based on instances) Methods  Compares the extension of classes, i.e., their set of instances rather than their interpretation.  Conditions in which such techniques can be used: When the classes share the same instances When they do not

Global Methods  After calculation of local similarity, it is remain to compute the alignment. This involve some kind of more global treatments, including:  aggregating the results of these base methods in order to compute the similarity between  compound entities  developing a strategy for computing these similarities in spite of cycles and non linearity in the constraints governing similarities  organizing the combination of various similarity / alignment algorithms  involving the user in the loop  finally extracting the alignments from the resulting (dis)similarity

Compound similarity

Global similarity computation  The computation of compound similarity is still local because it only provides similarity considering the neighborhood of a node.  Similarity may involve the ontologies as a whole and the final similarity values may ultimately depend on all the ontologies.  The distance defined by local methods can be defined in a circular way. (for instance if the distance between two classes depends on the distances between their instances which themselves depends on the distance between their classes or if there are circles in the ontology).  Strategies must be defined in order to compute this global similarity. Similarity Flooding Similarity equation fix point

Global similarity (Similarity Flooding)  Two ontologies are first translated into directed labeled graphs.  Creates another graph G whose nodes are pairs of nodes of the initial graphs and there is an edge between (o1, o’1) and (o2, o’2) labeled by p whenever there are edges (o1, p, o2) in the first graph and (o’1, p, o’2) in the second one.  computes initial similarity values between nodes (based on their labels for instance) and then iterates steps of re-computing the similarities between nodes in function of the similarity between their adjacent nodes at the previous step.  It stops when no similarity changes more than a particular threshold or after a predetermined number of steps.  Use a weighted linear aggregation in which the weight of an edge is the inverse of the number of other edges with the same label reaching the same couple of entities.

Learning Methods  Like in many other fields, learning methods developed in machine learning reveals useful in ontology alignment.  Two particular areas: supervised learning in which the ontology alignment algorithm learns how to work through the presentation of many good alignment (positive examples) and bad alignments (negative examples).  it is difficult to know which techniques works well for which ontology features.  An ontology alignment algorithm learnt with several ontology pairs, might not necessarily work well for a new ontology pair. Learning from data in which a population of instances is communicated to the algorithm together with theirs relations and the classes they belong to.

Users Feed Back  The support of effective interaction of the user with the system components is one concern of ontology alignment.  User input can take place in many areas of alignment: Assessing initial similarity between some terms; Invoking and composing alignment methods; Accepting or refusing similarity or alignment provided by the various methods.

Alignment Extraction  The ultimate alignment goal is a satisfactory set of correspondences between ontologies.  Manual Extraction: Display the entity pairs with their similarity scores and/or ranks and leaving the choice of the appropriate pairs up to the user of the alignment tool.  Automatic Extraction: Using Thresholds  Hard threshold retains all the correspondence above threshold n;  Delta method consists in using as a threshold the highest similarity value to which a particular constant value d is subtracted;  Proportional method: consists in using as a threshold the a percentage of the highest similarity value;  Percentage: retains the n% correspondences above the others.

Alignment Extraction, cont.  Automatic Extraction Using Optimization of the result  if an injective mapping is required then some choices need to be made in order to maximize the “quality” of the alignment.  that is typically measured on the total similarity of the aligned entity pairs.  A greedy alignment algorithm could construct the correspondences step-wise, at each step selecting the most similar pair and deleting its members from the table. The algorithm will then stop whenever no pair remains whose similarity is above the threshold. (Not Optimal)  Optimal Solution: Stable Marriage

Existing Works MethodYearOrganizationProject LeaderAutomatic Features Aggregation Lexical Structure String Semantic Instance OntoMorph 1997S. CaliforniaChalupskySemiT U.S. Army 1999DARPA SemiT Smart 1999SanfordFridman, NoySemiTT Chimaera 1999StanfordMcGuinnessSemiTT T Prompt 2001StanfordNoy, MusenSemiTT InfoSlueth 2001AmsterdamDingSemiTT A. Prompt 2002StanfordNoy, MusenSemiTT T Glue 2002IllinoisDoanAutomaticTTT T IF Map 2003SouthamptonKafoglouAutomaticT T NOM 2003KarlsruheEhricAutomaticTTTTT QOM 2004KarlsruheEhricAutomaticTTTT CROSI 2005SouthamptonKafoglouAutomaticTT T

Anchor-prompt

An Example: Anchor Prompt Method  The Anchor-PROMPT (an extension of PROMPT) is an ontology merging and alignment tool for possible matching terms.  Implemented in Protégé http://protege.stanford.edu http://protege.stanford.edu  Incremental algorithm Takes as input two ontologies and a set of anchors-pairs of related terms. Anchors are identified with the help of string-based techniques, or defined by a user. Then it refines them based on the ontology structures and users feedback.

Make initial suggestions Select the next operation Perform automatic updates Find conflicts Make suggestions The PROMPT Algorithm

Example: merge-classes Agency employee Agent Customer subclass of agent for Agent Employee Traveler subclass of has client Agency employee Agent Employee Customer Traveler subclass of agent for has client

Example: merge-classes (II) Agency employee Agent Employee Customer Traveler subclass of agent for has client Agency employee Agent Employee Customer Traveler subclass of agent for

Analyzing Global Properties Locally  Global properties classes that have the same sets of slots classes that refer to the same set of classes slots that are attached to the same classes  Local context incremental analysis consider only the concepts that were affected by the last operation

The PROMPT Operation Set  Extends the OKBC operation set with ontology-merging operations merge classes merge slots merge instances copy of a class  deep or shallow  with or without subclasses  with or without instances …

After a User Performs an Operation  For each operation perform the operation consider possible conflicts  identify conflicts  propose solutions analyze local context create new suggestions reinforce or downgrade existing suggestions

Conflicts  Conflicts that PROMPT identifies name conflicts dangling references redundancy in a class hierarchy slot-value restrictions that violate class inheritance

Agent Example: merge-classes

Operation Steps: merge-classes  Own slot and their values for the new class ask the user in case of conflicts or use preferences  Template slots for the new class union of template slots of the original classes  Subclasses and superclasses for the new class  Conflicts  Suggestions

Agent agent for Template Slots Copy template slots that don’t exist in the merged ontology agent for

Agent has client client Template Slots Attach the slots that have already been mapped

Employee Subclasses And Superclasses If a superclass (subclass) exists, re-establish the links Agent Agency employee superclass

Agent Dangling References Agent agent for Customer facet value For example, allowed class agent for facet value Customer _temp dummy frame

Agent client has client Additional Suggestions: Merge Slots If slot names at the merged class are similar, suggest to merge the slots

Agent Additional Suggestions: Merge Classes If the set of classes referenced by the merged class is the same as the set of classes referenced by another class, suggest a merge ReservationClient has clients handles reservations Agency employee

EmployeeAgency employee Agent If names of superclasses (subclasses) of the merged class are similar, suggest to merge the classes superclass Additional Suggestions: Merge Classes

To Summarize  Perform the actual operation  For the concepts (classes, slots, and instances) directly attached to the operation arguments perform global analysis for new suggestions Perform global analysis for new conflicts

Non-local context Classes directly referenced by C Slots in C Context C

Anchor-PROMPT: Using Non-Local Contexts  Input: A set of anchor pairs  Output: A set of related terms with similarity scores  Where do anchors come from? Lexical matching Interactive tools User-specified Ontology 1Ontology 2

Generating Paths in the Graph

Similarity Score Generate a set of all paths (of length < L) Generate a set of all possible pairs of paths of equal length For each pair of paths and for each pair of nodes in the identical positions in the paths, increment the similarity score Combine the similarity score for all the paths

Equivalence Groups

Anchor-PROMPT: Initial Results TRIALTrial PERSONPerson CROSSOVERCrossover PROTOCOLDesign TRIAL-SUBJECTPerson INVESTIGATORSPerson POPULATIONAction_Spec PERSONCharacter TREATMENT-POPULATIONCrossover_arm

Knowledge Model Assumptions The only assumption: An OKBC-compliant knowledge model

Protégé-2000  An environment for Ontology development Knowledge acquisition  Intuitive direct-manipulation interface  Extensibility Ability to plug in new components

Ontologies in Protégé-2000

Protégé-200 plugins  Domain-specific user-interface plugins  Alternative back ends for archival storage  Utility programs for knowledge-acquisition tasks  End-user applications

Protégé-based PROMPT tool  Protégé-2000 has an OKBC-compatible knowledge model allows building extensions through a plug-in mechanism  can work as a knowledge-base server for the plug-ins

The PROMPT tool

The PROMPT tool features  Setting a preferred ontology  Maintaining the user’s focus  Providing feedback to the user  Preserving original relations subclass-superclass relations slot attachment facet values  Linking to the direct-manipulation ontology editor  Logging operations

Coincidence based refinement

Introduction: Framework  The final steps of each phase

Coincidence or Being the same? Geometrically Equivalent Not the same however! Perfectly the same! Perfectly coinciding Less the same because they coincide less! More the same because they coincide more!

Background – Metric Space Our interpretation from the output of phase#1  d(p, q) > 0 iff p ≠ q; d(p, p) = 0  d(p, q) = d(q, p)  d(p, q) ≤ d(p, r) + d(r, q)  d is a metric, and X is a metric space. p q d? S

Background – Typed Graphs G(V, E, T) Set of Vertices Set of Edges Set of Types Types of Edges such as t 0,, t 1, and t 2 above

Refinement of Ontology Alignment: Problem Specification O 1 or G 1 : First Ontology O 2 or G 2 : Second Ontology Interpreted as the first Typed Graph Interpreted as the second Typed Graph d: a metric showing the distance between each pair of node from O 1 and O 2. Coincidence Examiner (for Refinement) How much each possible matching is good Based on how much matching each pair of nodes can improve the amount to which the two ontologies seem coinciding.

Desired Properties for the Coincidence Examiner Big Benefit Modes t Benefit Low Benefit Low Penalt y Modes t Penalt y Big Penalt y

Our Proposed Formula f and g: normalisation factors Both restrictively increasing Range outside certain neighbourhoods around origin

Commentary  No symmetry  Quasi Metric Spaces  No way to guarantee triangular inequality but rational  No tractable solution for the typed graph isomorphism problem  Heuristics for dealing with it in a P time  OWL-related heuristics

OWL-based Heuristics (1#2) Discard and Contraction: IS-A: contraction/expansion Disjoint: distance-based removal Equivalent: accumulation owl:functionalProperty: mapped only to a functional property

OWL-based Heuristics (2#2)  No Cross!

Altogether (Our Algorithm) 1. Input O and O’ 2. Apply a threshold-based refinement on O and O’ 3. Apply recipes for Discard and Contraction on the resulting. 4. Weight all the remained possible mapping. 5. Expand back the contracted parts. 6. Output mappings along with their weights

Example Before Contraction After ContractionFinal Mapping

The End

Ontology Alignment Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.

Similar presentations

Presentation on theme: "Ontology Alignment Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ontology Alignment Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.

Similar presentations

Presentation on theme: "Ontology Alignment Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology."— Presentation transcript:

Similar presentations

About project

Feedback