February 2001 Gio Wiederhold Stanford University

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Database Systems: Design, Implementation, and Management Tenth Edition
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 8 Slide 1 System models.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Gio Wiederhold SimQL 1 SimQL Accessing Simulation as Services to Information Systems Gio Wiederhold July 2001.
File Systems and Databases
A Review of Ontology Mapping, Merging, and Integration Presenter: Yihong Ding.
Modified from Sommerville’s originalsSoftware Engineering, 7th edition. Chapter 8 Slide 1 System models.
March 2000 Gio XIT 1 Increasing the Precision when Obtaining Information from the Web Gio Wiederhold Stanford University 4 April 2000 related report: www-db.stanford.edu/pub/gio/1999/miti.htm.
Modified from Sommerville’s originalsSoftware Engineering, 7th edition. Chapter 8 Slide 1 System models.
August 2000 Gio vdR 1 Increasing the Precision of Semantic Interoperation Gio Wiederhold Stanford University August 2000 Reind van de Riet celebration.
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy and Mark A. Musen.
Methodology Conceptual Database Design
March 2000 Gio XIT 1 Increasing the Precision of Semantic Interoperation Gio Wiederhold Stanford University March 2000 report: www-db.stanford.edu/pub/gio/1999/miti.htm.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Chapter 10 Architectural Design
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 7 Slide 1 System models l Abstract descriptions of systems whose requirements are being.
Chapter 4 System Models A description of the various models that can be used to specify software systems.
System models Abstract descriptions of systems whose requirements are being analysed Abstract descriptions of systems whose requirements are being analysed.
Database Systems: Design, Implementation, and Management Ninth Edition
Approaching a Problem Where do we start? How do we proceed?
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Chapter 7 System models.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
System models l Abstract descriptions of systems whose requirements are being analysed.
Modified by Juan M. Gomez Software Engineering, 6th edition. Chapter 7 Slide 1 Chapter 7 System Models.
Software Engineering, 8th edition Chapter 8 1 Courtesy: ©Ian Somerville 2006 April 06 th, 2009 Lecture # 13 System models.
Sommerville 2004,Mejia-Alvarez 2009Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
S calable K nowledge C omposition Ontology Interoperation January 19, 1999 Jan Jannink, Prasenjit Mitra, Srinivasan Pichai, Danladi Verheijen, Gio Wiederhold.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Object storage and object interoperability
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Department of Mathematics Computer and Information Science1 CS 351: Database Management Systems Christopher I. G. Lanclos Chapter 4.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Viewpoint Modeling and Model-Based Media Generation for Systems Engineers Automatic View and Document Generation for Scalable Model- Based Engineering.
Of 24 lecture 11: ontology – mediation, merging & aligning.
1 © 2013 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the.
Engineering, 7th edition. Chapter 8 Slide 1 System models.
Database Systems: Design, Implementation, and Management Tenth Edition
Introduction To DBMS.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
The Semantic Web By: Maulik Parikh.
ITEC 3220A Using and Designing Database Systems
Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.
Distribution and components
Component Based Software Engineering
Abstract descriptions of systems whose requirements are being analysed
Using Knowledge to Exploit Data
Tools For Resolving Heterogeneity Computer Science Department
8. Educational Challenges
File Systems and Databases
Lec 3: Object-Oriented Data Modeling
Obtaining Precision when Integrating Information
Semantic Precision for Web-based Interoperation
Software Architecture
Stanford University, CSD
Data Model.
Database Systems Instructor Name: Lecture-3.
Ontology-Based Approaches to Data Integration
Lecture 10 Structuring System Requirements: Conceptual Data Modeling
Presentation transcript:

February 2001 Gio Wiederhold Stanford University K C An Algebraic Approach to Articulate Ontologies for Information Integration February 2001 Gio Wiederhold Stanford University 9/16/2018 Gio Wiederhold SKC 1

What are Ontologies? Ontologies list the terms and their relationships that allow communication among partners in enterprises (in machine-readable form) Relationships determine meaning - parent, school, company Databases use ontologies during design in their E-R diagrams (Implicitly) and represent the leaf nodes in their schemas Knowledge-bases use ontologies (often implicitly) add class definition (to hold instances), constraints, and, sometimes, operations among the terms 9/16/2018 Gio Wiederhold SKC 2

Functions of Ontologies . Enable Precision in Understanding People = designers, implementors, users, maintainers Systems = implementors = users = maintainers Share the Cost of Knowledge Acquistion & Maintenance reuse encoded knowledge, remain up-to-date as domains change Enable Information Interoperation * Define the terms that link domains 9/16/2018 Gio Wiederhold SKC 3

Ancestors of Ontologies . Lexicons: collect terms used in informtion systems Taxonomies: categorize, abstract, classify terms Schemas of databases: attributes, ranges, constraints Data dictionaries: systems with multiple files, owners Object libraries: grouped attributes, inherit., methods Symbol tables: terms bound to implemented programs Domain object models: (XML DTD): interchange terms . . . More Knowledge formalized 9/16/2018 Gio Wiederhold SKC 4

Establishing Ontologies . Top-down: Commonly acceptable UPPER layers Domain-specific Sharing tools Object based Bottom-up Pragmatic, TASK-specific collections Database schemas and models 9/16/2018 Gio Wiederhold SKC 5

Heterogeneity among Domains If interoperation involves distinct domains mismatch ensues Autonomy conflicts with consistency, Local Needs have Priority, Outside uses are a Byproduct Heterogeneity must be addressed Platform and Operating Systems  Representation and Access Conventions  Naming and Ontology  9/16/2018 Gio Wiederhold SKC 6

Semantic Mismatches Information comes from many autonomous sources Differing viewpoints (by source) differing terms for similar items { lorry, truck } same terms for dissimilar items trunk(luggage, car) differing coverage vehicles (DMV, AIA) differing granularity trucks (shipper, manuf.) different scope student museum fee, Stanford Hinders use of information from disjoint sources missed linkages loss of information, opportunities irrelevant linkages overload on user or application program Poor precision when merged Still ok for web browsing , poor for business & science 9/16/2018 Gio Wiederhold SKC 7

Need for precision Information Wall More precision is needed as data volume increases --- a small error rate still leads to too many errors False Positives have to be investigated ( attractive-looking supplier - makes toys apparent drug-target with poor annotation ) False Negatives cause lost opportunities, suboptimal to some degree False positives = poor precision typically cost more than false negatives = poor recall data errors information quantity human limit acceptable limit human with tools? Information Wall adapted from Warren Powell, Princeton Un. 9/16/2018 Gio Wiederhold SKC 8

Ontology Sharing Three Alternatives Create a committee to define everybody’s terms Takes many years, until people are worn out Ignored when changes make deviation necessary Get all terms and put them into large model [ Cyc, UMLS, Federated Schemas, . . . ] Can be rapid Ignores conflicts Hard to maintain (requires committee) Keep all Terms distinct, except where sharing Requires initial effort Empowers participants 9/16/2018 Gio Wiederhold SKC 9

Proposed Language Solutions Specify and define terminology usage: ontology Domain-specific ontologies XML DTD assumption Small, focused, cooperating groups high quality, some examples - genomics, arthritis, Shakespeare plays allows sharable, formal tools ongoing, local maintenance affecting users - annual updates poor interoperation, users still face inter-domain mismatches Cannot achieve globally consistency wonderful for users and their programs too many interacting sources long time to achieve, 2 sources (UAL, BA), 3 (+ trucks), 4, … all ? costly maintenance, since all sources evolve no world-wide authority to dictate conformance 9/16/2018 Gio Wiederhold SKC 10

An unsolved problem Common assumption in assembling and integrating distributed information resources The language used by the resources is the same Sub languages used by the resources are subsets of a globally consistent language This assumption is provably false Working towards the goal of globally consistency is 1. naïve -- the goal cannot be achieved 2. inefficient -- languages are efficient in local contexts 9/16/2018 Gio Wiederhold SKC 11

General Ontologies? Have all the Knowledge together simple for customers of KBs hard for owners of KBs Large KB will cover multiple domains created by a committee -- slow maintained by a committee -- costly Differences in level of abstraction -- efficiency homeowner: nail carpenter: sinker, brad, boxnail, . . . 9/16/2018 Gio Wiederhold SKC 12

Structural Heterogeneity 9/16/2018 Gio Wiederhold SKC 13

Domains and Consistency . a domain will contain many objects the object configuration is consistent within a domain all terms are consistent & relationships among objects are consistent context is implicit No committee is needed to forge compromises * within a domain Domain Ontology Compromises hide valuable details 9/16/2018 Gio Wiederhold SKC 14

SKC grounded definition . Ontology: a set of terms and their relationships Term: a reference to real-world and abstract objects Relationship: a named and typed set of links between objects Reference: a label that names objects Abstract object: a concept which refers to other objects Real-world object: an entity instance with a physical manifestation 9/16/2018 Gio Wiederhold SKC 15

Domain-specific Expertise . Knowledge needed is huge Partition into natural domains Determine domain responsibility and authority Empower domain owners Provide tools Consider interaction Society of specialists 9/16/2018 Gio Wiederhold SKC 16

An Ontology Algebra A knowledge-based algebra for ontologies The Articulation Ontology (AO) consists of matching rules that link domain ontologies Intersection create a subset ontology keep sharable entries Union create a joint ontology merge entries Difference create a distinct ontology remove shared entries 9/16/2018 Gio Wiederhold SKC 17

Sample Operation: INTERSECTION Result contains shared terms Terms useful for purchasing Source Domain 1: Owned and maintained by Store Source Domain 2: Owned and maintained by Factory 9/16/2018 Gio Wiederhold SKC 18

INTERSECTION support Articulation ontology Matching rules that use terms from the 2 source domains Terms useful for purchasing Store Ontology Factory Ontology 9/16/2018 Gio Wiederhold SKC 19

color =table(colcode) Sample Intersections Articulation ontology matching rules : size = size color =table(colcode) style = style Ana- tomy Material inventory {...} Employees { . . . } Machinery { . . . } Processes { . . . } Shoes { . . . } Shoe Factory Shoe Store Shoes { . . . } Customers { . . . } Employees { . . . } {. . . } Hard- ware foot = foot Employees Employees Nail (toe, foot) Nail (fastener) . . . . . . Department Store 9/16/2018 Gio Wiederhold SKC 20

Other Basic Operations UNION: merging entire ontologies DIFFERENCE: material fully under local control Arti- culation ontology typically prior intersections 9/16/2018 Gio Wiederhold SKC 21

Features of an algebra Operations can be composed Operations can be rearranged Alternate arrangements can be evaluated Optimization is enabled The record of past operations can be kept and reused 9/16/2018 Gio Wiederhold SKC 22

Sample Processing in HPKB What is the most recent year an OPEC member nation was on the UN security council (SC)? Related to DARPA HPKB Challenge Problem SKC resolves 3 Sources CIA Factbook ‘96 (nation) OPEC (members, dates) UN (SC members, years) SKC obtains the Correct Answer 1996 (Indonesia) Other groups obtained more, but factually wrong answers Problems resolved by SKC Factbook – a secondary source -- has out of date OPEC & UN SC lists Indonesia not listed Gabon (left OPEC 1994) different country names Gambia => The Gambia historical country names Yugoslavia UN lists future security council members Gabon 1999 needed ancillary data 9/16/2018 Gio Wiederhold SKC 23

Interoperation via Articulation At application definition time Match ontologies Establish articulation rules. Record the process At execution time Query rewriting Optimization based on an Ontology Algebra. For maintenance Regenerate rules using the stored formulation 9/16/2018 Gio Wiederhold SKC 24

Semi-automatic approach Provide library of automatic match heuristics Lexical Methods -- spelling Structural Methods -- relative graph position Reasoning-based Methods Nexus  Hybrid Methods Iterative/Non-iterative Methods GUI tool to - display matches and - verify generated matches using human expert - expert can also supply matching rules 9/16/2018 Gio Wiederhold SKC 25

Articulation Generator Being built by Prasenjit Mitra Thesaurus OntA Context-based Word Relator Phrase Relator Driver Semantic Network (Nexus) Structural Matcher Ont1 Ont2 Human Expert 9/16/2018 Gio Wiederhold SKC 26

Lexical Methods Preprocessing rules. -Expert-generated seed rules. e.g., (Match O1.President O2.PrimeMinister) -Context-based preprocessing directives. Thesaurus - synonyms, relationships Distance of words as measure of relatedness. 9/16/2018 Gio Wiederhold SKC 27

Tools to create articulations registration Vehicle ontology Vehicle ontology sales Combine ontology graphs with expert selection based on spelling, graph matching, and a nexus derived from a dictionary (O.E.D.) Suggestions for articulations 9/16/2018 Gio Wiederhold SKC 28

Tools to create articulations Graph matcher for Articulation- creating Expert Vehicle ontology Transport ontology Suggestions for articulations 9/16/2018 Gio Wiederhold SKC 29

continue from initial point Also suggest similar terms for further articulation: by spelling similarity, by graph position by term match repository Expert response: 1. Okay 2. False 3. Irrelevant to this articulation All results are recorded Okay’s are converted into articulation rules 9/16/2018 Gio Wiederhold SKC 30

Candidate Match Repository Term linkages automatically extracted from 1912 Webster’s dictionary * * free, other sources .have been processed. Based on processing headwords ý definitions using algebra primitives Notice presence of 2 domains: chemistry, transport 9/16/2018 Gio Wiederhold SKC 31

Using the match repository 9/16/2018 Gio Wiederhold SKC 32

Navigating the match repository 9/16/2018 Gio Wiederhold SKC 33

Relative Arc Importance PageRank (Google) limitations node oriented high rank to words with little semantic value conjunctions, articles And The prepositions, pronouns to it Relative arc importance contribution of source rank to target rank 9/16/2018 Gio Wiederhold SKC 34

ArcRank For All source s and target t nodes in graph sort outgoing , rank by sorted order sort incoming , rank by sorted order for each arc compute In ranking Equal values take same rank Ranks numbered consecutively 9/16/2018 Gio Wiederhold SKC 35

All Pairs Similarity Compute similarity value for all node pairs product of inbound arc importance vectors product of outbound arc importance vectors similarity = Similarity Matrix Initial state: nodes similar only to themselves Node substitution: terms replace similar ones Iterative convergence: bounded substitution 9/16/2018 Gio Wiederhold SKC 36

Contents Examples Verb (Educate) Adverb (Ever) Proper Noun (Scotland -- undefined term 9/16/2018 Gio Wiederhold SKC 37

Examples (Verb) 9/16/2018 Gio Wiederhold SKC 38

Examples (Adverb) 9/16/2018 Gio Wiederhold SKC 39

Examples (Proper Noun) 9/16/2018 Gio Wiederhold SKC 40

Country Graphs 9/16/2018 Gio Wiederhold SKC 41

To be matched to 9/16/2018 Gio Wiederhold SKC 42

Knowledge Composition Composed knowledge for applications using A,B,C,E Articulation knowledge for U (A B) (B C) (C E) Articulation knowledge Legend: U : union U : intersection U (C E) Articulation knowledge for Knowledge resource E Knowledge resource C (A B) U U (B C) U (C D) Knowledge resource A Knowledge resource B Knowledge resource D 9/16/2018 Gio Wiederhold SKC 43

Primitive Operations Model and Instance Constructors create object create set Connectors match object match set Editors insert value edit value move value delete value Converters object - value object indirection reference indirection Unary Summarize -- structure up Glossarize - list terms Filter - reduce instances Extract - circumscription Binary Match - data corrobaration Difference - distance measure Intersect - schem discovery Blend - schema extension 9/16/2018 Gio Wiederhold SKC 44

Future: exploiting the result Avoid n2 problem of interpreter mapping as stated by Swartout as an issue in HPKB year 1 Result has links to source Processing & query evaluation is best performed within Source Domains & by their engines 9/16/2018 Gio Wiederhold SKC 45

SKC Synopsis Research: Reliable query answers from heterogeneous, imperfect data sources Sources: General: CIA World Factbook ‘96, UN-www, OPEC-www Webster’s Dictionary, Thesaurus, Oxford English Dictionary Topical: OPEC, BattleSpace Sensors, Logistics Servers Client: DARPA High Performance Knowledge Base project Theory: Rule-based algebra Translation & Composition primitives 9/16/2018 Gio Wiederhold SKC 46

Domain Specialization Knowledge Acquisition (20% effort) & Knowledge Maintenance (80% effort *) to be performed by Domain specialists Professional organizations Field teams of modest size autonomously maintainable Empowerment * based on experience with software 9/16/2018 Gio Wiederhold SKC 47

Innovation in SKC No need to harmonize full ontologies Focus on what is critical for interoperation Rules specific for articulation Tools for creation and maintenance Maintenance is distributed to n sources to m articulation agents Potentially many sets of articulation rules is m < n2 , depends on semantic architecture density a research question 9/16/2018 Gio Wiederhold SKC 48

Backup Viewgraphs 9/16/2018 Gio Wiederhold SKC 49

Summary . Algebra enables Interoperation by dealing explicitly with differences by knowledge identifying maintenance domains keeping sources autonomous Assumes domain has a common ontology composing domain ontologies requires the algebra to manage the linkages where articulation occurs processes are best executed within the domains Knowledge about articulation is disjoint allows integration specialists to work independently supports multiple intersections and views Maintenance is structured and partitioned 9/16/2018 Gio Wiederhold SKC 50

Current Directions Experience with real world (imperfect) data confirms validity of our approach Expert sources are better maintained than general sources Rules applied to multiple sources provide more reliable and accurate query results Component architecture enables scalable, maintainable knowledge base development Developing proof of concept environment with HPKB standard knowledge base connectivity interface 9/16/2018 Gio Wiederhold SKC 51

Components of Ontologies . Vocabularies words as handles for classes and concepts from database schemas. textbooks, catalog indexes identified with their domain or context Relationships meaning is primarily defined through relationships Employee of a Company; Nails in a Shoe Annotations material to clarify the meaning of the terms contributed by users as well as original authors best with some examples 9/16/2018 Gio Wiederhold SKC 52

Status September 1997 . Base HPKB funding from AFOSR New World Vistas some industrial co-funding Prior work supported through Commercenet support for common representation, an interlingua Acquiring ontologies that are interesting to HPKB projects not trivial, I.e., represent realistic activities intersectable Logistics: DoD CIM, CIA, Cyc, . . . Starting smart students Integrating into architecture managed by TFS 9/16/2018 Gio Wiederhold SKC 53

Information Flow for Training Initiative sample scenarios ISI scenario language doctrine TRADOC Scenarios scenario justification scenario refinement Objectives trainer / controller mediator knowledge base aggregation/ analysis/ evaluation Requirements exercise design Legend Data collection Probe- point settings sources tasks draft 1 explosion aggregation 9/16/2018 Gio Wiederhold SKC 54

Interlingua(s) Interlingua: Object Exchange Model OEM Interlingua: Knowledge Interchange Format KIF Query: Knowledge Query and Manipulation Language KQML (PACKAGE :FROM ap001 :TO ap002 :CONTENT (MSG :TYPE query :CONTENT-LANGUAGE KIF :CONTENT (and (document (author@biblio ?a) (title@sybase ?t)) (eq “Jeff Ullman” ?a))) Interlingua: Object Exchange Model OEM Query : Mediator Specification Language MSL <document {<author AUTHOR> <title TITLE>}:- <biblioentry {<author AUTHOR>}>@biblio AND <inproceedings {<title TITLE>}> @sybase AND Equal(AUTHOR, “Jeff Ullman”) { OID, LABEL, TYPE, VALUE } 9/16/2018 Gio Wiederhold SKC 55

Support for KB-Algebra Ontolingua [Gruber, Fikes @ Stanford KSL]: Repository for Domain Terminologies Used for mechanical design, bibliographies, catalogs LOOM [MacGregor@ USC ISI]: Classification-based Expert System Helps in structuring and processing ontologies PROTÉGÉ [Musen@ Stanford MIS] Reuse Penguin [Barsalou, Keller@ Stanford MIS, CIFE]: Object manipulation based on Relational Algebra Used for genetics laboratory, building design 9/16/2018 Gio Wiederhold SKC 56