Ontology Maintenance with an Algebraic Methodology: a Case Study Jan Jannink, Gio Wiederhold Presented by: Lei Lei
Challenges Obstacle: Autonomy of diverse knowledge sources Data volatility and amount increases cost Major challeges: Establish and maintain application specific portion of knowledge sources
An Algebraic Approach Construct virtual knowledge bases geared to a specific application Use composable operators to transform contexts into contexts Operators express relevant parts of a source and the conditions using rules Rules define a valid context transformation
On-line Dictionary:Webster Autonomously maintained to develop a novel thesaurus application 120,000 entries, two million words Semi-annual updates Errors and inconsistencies help robustness
Target Application Construct a graph of the definitions to determine related terms, and automatically generate thesaurus entries
Related Work Ontology composition (Wiederhold 1994) Rule-based approach to semantic integration (Bergamaschi et al. 1999) Semantic reconciliation (Siegel 1991) Uschold et al Specification morphisms, (Smith 1993) WordNet system (Miller & al. 1990) WHIRL (Cohen 1998) PageRank (Page&Brin 1998) Latent semantic indexing (Deerwester 1990) Hypertext authority (Kleinberg 1998)
Outline Algebra Usage Scenario Background Context Creation Ontology Maintainance Future Work Conclusion My Evaluation
Typical Algebra Usage Scenario A minimal sufficient set of Linkage between items in different resources
Background Algebraic Operators Canonical unary to establish and refine a context within which the source knowledge meets the application requirement
Background(Cont.) Semantic Context * No global notion of consistency * Defined as objects that encapsulate other objects * Congruity: relevance of source info. to application * Similarity: equivalent and mergeable objects between different sources
Rule Language(Cont.) Allow uninterpreted components of an object to become attributes of the object Constructors: create new objects Constructors: generate proxy objects Editors & convertors: modify the objects
Object Model(Cont.) Subsume existing models Only objects have an identity to which others can refer Correspond to XML supplemented with obj. identity Rich to model complex relationship
Context Creation Summarize Operator (S operator) Transforms source data based on a predicate Create object: Encapsulates & populates Data classification: Groups source into equivalent classes Syntax: (given contexts c1,c2, a matching rule e)
Context Creation(Cont.) 1.Predicate e partitions the objects of c1 into n equivalent parts 2. c2 consists of n+1 values: e,s1,s2,…,sn 3.One is an exception class, not match e
Example with Webster’s Dictionary Automatic Thesaurus Extraction from Dictionary
Example(Cont.) Construct a directed graph from definition: 1.Each head word and definition grouping is a node 2.Each word in a definition node is an arc to the node having that head word Definition from the dictionary data for Egoism
Context Creation(Cont.) *Syllable and accent markers in head words *Misspelled head words *Mis-tagged fields *Stemming and irregular verbs(Hopelessness) *Common abbreviations in definitions(etc.) *Undefined words with common prefixes(un-) *Multi-word head words(Water Buffalo) *Underfined hyphenated and compound words Set 99% accuracy in the conversion from data to graph stru.
Constructing the Congruity Expression An object that represents the entire source Subdivided into chunks One head word One definition Express congruity relationship between the dictionary and thesaurus application
Ontology Maintenance Context Refinement Return the ten longest head words of the dictionary
Maintaining the Ontology Changes in source help correct and extend dict. Maintain statistics with the S operator when extracting the relevant parts of the dictionary Find no longer needed rules Note which rules no longer needed A comparison of the terms reveals new errors
Future Work A web based interface to display ArcRank algorithm based on PageRank (
Conclusion An on-line dictionary is good test-bed An algebraic approach improving maintainability Congruity simplified identification and handling of changes Use Summarize to define and refine a context that prepare the dictionary data for thesaurus service use
My Assessment Strength * Decouple the selection of congruent parts of the source data *Congruity and similarity measure use algebra rather than single language *Mirror classes using operators of the algebra instead of low level abstract primitives that are difficult to compose Weakness * Details of ci’=S(ci) are needed *Difficult to grasp the capability of S operator *Accuracy and error accumulation problem *Ambiguous Rules Generation
Questions?