Semantic Precision for Web-based Interoperation

Slides:



Advertisements
Similar presentations
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advertisements

The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
Database Systems: Design, Implementation, and Management Tenth Edition
© Chinese University, CSE Dept. Software Engineering / Software Engineering Topic 1: Software Engineering: A Preview Your Name: ____________________.
The Database Environment
Gio Wiederhold SimQL 1 SimQL Accessing Simulation as Services to Information Systems Gio Wiederhold July 2001.
Status of Mediation Technology Gio Wiederhold Stanford University Oct 1999 SNU -- KINS.
File Systems and Databases
A Review of Ontology Mapping, Merging, and Integration Presenter: Yihong Ding.
March 2000 Gio XIT 1 Increasing the Precision when Obtaining Information from the Web Gio Wiederhold Stanford University 4 April 2000 related report: www-db.stanford.edu/pub/gio/1999/miti.htm.
August 2000 Gio vdR 1 Increasing the Precision of Semantic Interoperation Gio Wiederhold Stanford University August 2000 Reind van de Riet celebration.
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy and Mark A. Musen.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
E-commerce Gartner: 2000 prediction for 2004: 7.3 T$ Revision:2001 prediction for 2004: 5.9 T$ drastic loss? Growth and Perception
April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002 www-db.stanford.edu/people/gio.html.
Gio Wiederhold SimQL 1 Integration of Simulation Results into Information Systems Gio Wiederhold April 2002.
Methodology Conceptual Database Design
March 2000 Gio XIT 1 Increasing the Precision of Semantic Interoperation Gio Wiederhold Stanford University March 2000 report: www-db.stanford.edu/pub/gio/1999/miti.htm.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Database Systems: Design, Implementation, and Management Ninth Edition
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
CST203-2 Database Management Systems Lecture 2. One Tier Architecture Eg: In this scenario, a workgroup database is stored in a shared location on a single.
CS 474 Database Design and Application Terminology Jan 11, 2000.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 1: The Database Environment Modern Database Management 9 th Edition Jeffrey A. Hoffer,
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
S calable K nowledge C omposition Ontology Interoperation January 19, 1999 Jan Jannink, Prasenjit Mitra, Srinivasan Pichai, Danladi Verheijen, Gio Wiederhold.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Information Dynamics & Interoperability Presented at: NIT 2001 Global Digital Library Development in the New Millennium Beijing, China, May 2001, and DELOS.
MODEL-BASED SOFTWARE ARCHITECTURES.  Models of software are used in an increasing number of projects to handle the complexity of application domains.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Jemerson Pedernal IT 2.1 FUNDAMENTALS OF DATABASE APPLICATIONS by PEDERNAL, JEMERSON G. [BS-Computer Science] Palawan State University Computer Network.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
The Semantic Web By: Maulik Parikh.
INFORMATION SYSTEM CATEGORIES
Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.
Component Based Software Engineering
February 2001 Gio Wiederhold Stanford University
Using Knowledge to Exploit Data
Multi-agent system for web services
Tools For Resolving Heterogeneity Computer Science Department
MANAGING KNOWLEDGE FOR THE DIGITAL FIRM
Chapter 1 Database Systems
File Systems and Databases
Data Warehousing and Data Mining
Obtaining Precision when Integrating Information
Chapter 1: The Database Environment
Stanford University, CSD
Gio Wiederhold, Kincho Law, Stefan Decker
Database Systems Instructor Name: Lecture-3.
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Web Mining Department of Computer Science and Engg.
The Database Environment
Chapter 1 Database Systems
Automated Analysis and Code Generation for Domain-Specific Models
Chapter 3 Database Management
The Database Environment
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Presentation transcript:

Semantic Precision for Web-based Interoperation RIACS / NASA AMES Semantic Precision for Web-based Interoperation Gio Wiederhold Stanford University July 2001 www-db.stanford.edu/people/gio.html Thanks to Jan Jannink, Shrish Agarwal, Prasenjit Mitra, & Stefan Decker. 11/28/2018 Gio ICEIS 1

Outline Setting VG 3 - VG 5 Precision VG 6 - VG 8 Lack of precision VG 9 - VG 11 SKC solution VG 12, VG 20- VG 29 Ontologies VG 13 - VG 19 Early results VG 30 Interoperation VG 31 - VG 32 Tool & examples VG 33 - VG 41 Composition and execution VG 42 - VG 43 Evolution and maintenance VG44 - VG46 Summary – SKC to general science VG 47 - VG 49 11/28/2018 Gio ICEIS 2

T r e n d s 1998 : 1999 Users of the Internet 40%  52% of U.S. population Growth of Net Sites (now 2.2M public sites with 288M pages) Expected growth in E-commerce by Internet users [BW, 6 Sep.1999] segment 1998 1999 books 7.2%  16.0% music & video 6.3%  16.4% toys 3.1%  10.3% travel 2.6%  4.0% tickets 1.4%  4.2% Overall 8.0%  33.0% = $9.5Billion An unstainable trend cannot be sustained [Herbert Stein]  new services 98 99 00 01 02 03 04 0.3 1 3 9 27 81 ** 90 80 70 60 50 40 30 20 10 0 Year / % Ü % Ü E-penetration Toys Centroid, in 1999 ~1% of total market 11/28/2018 Gio ICEIS 3

Growth and Perception E-commerce Gartner: 2000 prediction for 2004: 7.3 T$ Revision:2001 prediction for 2004: 5.9 T$ drastic loss? 50 companies, each after 20% of the market Examples Artificial Intelligence Databases Neural networks E-commerce Extrapolated growth Disap- pointment Combi- natorial growth Realistic growth Failures Perceived growth Perception level Perceived initial growth Invisible growth 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ... 11/28/2018 Gio ICEIS 4

Our* Information Environment . B2B, B2C, G2G, G2C, . . . In the past: Scarcity Customers needed more information to make better decisions Today: Excess The web provides more information than customers can digest Effect: confusion in decision-making Must I look at all possibly relevant information? What is the penalty for missing something ? What is the cost of looking at everything ? I am confused, best defer making any decision . . . . . . . . . . 11/28/2018 Gio ICEIS 5

Need for precision Precision: Few wrong or irrelevant results More precision is needed as data volume increases --- a small error rate still leads to too many errors Information Wall human limit hard to move human with tools? acceptable limit data error rate information quantity adapted from Warren Powell, Princeton Un. 11/28/2018 Gio ICEIS 6

Relationships among parameters volume retrieved volume available 100% 50% 0% space of methods %tage actually relevant perfect recall v.relevant p= v.retrieved r = v.available perfect precision Type 1 errors recall Type 2 errors precision 11/28/2018 Gio ICEIS 7

Cost of Error types differs 2 1 Missed Valid Information (False Negatives ) causes lost opportunities cheapest shovel, . . . suboptimal decision-making by x valid suppliers Excess Information (False Positives ) has to be investigated attractive-looking supplier - makes toys Cost- benefit Space of results, ordered Having many cases of excess information costs more than some missing information 11/28/2018 Gio ICEIS 8

A Major Cause of Errors Searches extend over many domains Domains have their own terminologies Need autonomy to deal with knowledge growth The usage of terms in a domain is efficient Appropriate granularity Mechanic working on a truck vs. logistics manager Shorthand notations PSU vs. PSU Functions differ in scope Payroll versus Personnel getting paid vs. available (includes contract staff) 11/28/2018 Gio ICEIS 9

Semantic Mismatches Information comes from many autonomous sources Differing viewpoints ( by source ) differing terms for similar items { lorry, truck } same terms for dissimilar items trunk( luggage, car) differing coverage vehicles ( DMV, police, AIA ) differing granularity trucks ( shipper, manuf. ) different scope student ( museum fee, Stanford ) different hierarchical structures supplier vs. usage  Hinders use of information from disjoint sources missed linkages loss of information, opportunities irrelevant linkages overload on user or application program Poor precision when merged Still ok for web browsing , poor for business & science 11/28/2018 Gio ICEIS 10

Structural Heterogeneity Same concept? If incorporated in differing structures? 11/28/2018 Gio ICEIS 11

Approach (SKC project) Scalable Knowledge Composition – Stanford Univ. DB group Define Terminology in a domain precisely Schemas, XML DTDs  Ontologies Develop methods to permit interoperation among differing domains (not integration) Articulation Ontology Algebra Develop tools to support the methods Ontology matching 11/28/2018 Gio ICEIS 12

Functions of Ontologies . Enable Precision in Understanding People = designers, implementors, users, maintainers Systems = sources, mediators, applications Share the Cost of Knowledge Acquisition & Maintenance reuse encoded knowledge, remain up-to-date as domains change Enable Information Interoperation * Define the terms that link domains 11/28/2018 Gio ICEIS 13

Ancestors of Ontologies Lexicons: collect terms used in information systems Taxonomies: categorize, abstract, classify terms Schemas of databases: attributes, ranges, constraints Data dictionaries: systems with multiple files, owners Object libraries: grouped attributes, inherit., methods Symbol tables: terms bound to implemented programs Domain object models: (XML DTD): interchange terms . . . More Knowledge formalized 11/28/2018 Gio ICEIS 14

bound to an application Data and Knowledge Much Data Diverse Knowledge Knowledge Loop Information is created at the confluence of data -- the state & knowledge -- the ability to select and project the state into the future Data Loop Storage Education Selection Recording Integration Abstraction Experience State changes Decision-making Action bound to an application 11/28/2018 Gio ICEIS 15

Two Mismatch Solutions A Single, Globally consistent Ontology ( Your Hope ) wonderful for users and their programs too many interacting sources long time to achieve, 2 sources ( UAL, LH ), 3 (+ trucks), 4, … all ? costly maintenance, since all sources evolve no world-wide authority to dictate conformance Domain-specific ontologies ( XML DTD assumption ) Small, focused, cooperating groups high quality, some examples - arthritis, Shakespeare plays allows sharable, formal tools ongoing, local maintenance affecting users - annual updates poor interoperation, users still face inter-domain mismatches 11/28/2018 Gio ICEIS 16

Global consistency: Hope, but . . Common assumptions in assembling and integrating distributed information resources The language used by the resources is the same Sub languages used by the resources are subsets of a globally consistent language These assumptions are provably false Working towards the goal of globally consistency is 1. naïve -- the goal cannot be achieved inefficient -- languages are efficient in local contexts unmaintainable – terminology evolves with progress 11/28/2018 Gio ICEIS 17

Domain-specific Expertise . Knowledge needed is huge Partition into natural domains Determine domain responsibility and authority Empower domain owners Provide tools Consider interaction Society of specialists 11/28/2018 Gio ICEIS 18

Domains and Consistency . a domain will contain many objects the object configuration is consistent within a domain all terms are consistent & relationships among objects are consistent context is implicit No committee is needed to forge compromises * within a domain Domain Ontology Compromises hide valuable details 11/28/2018 Gio ICEIS 19

SKC grounded definition . Ontology: a set of terms and their relationships Term: a reference to real-world and abstract objects Relationship: a named and typed set of links between objects Reference: a label that names objects Abstract object: a concept which refers to other objects Real-world object: an entity instance with a physical manifestation (or its representation in a factual database) 11/28/2018 Gio ICEIS 20

Grounding enables implementation We use many abstract terms in our work Needed because we are dealing with many objects Human thinking is limited to short-term memory Someone must be able to translate them into code reliably Each abstract term must have a path to reality You must provide that path for students and coders Without a clear path that is not possible Not automatically at all – machines need specs Not reliably by human programmers – failures occur Without implementation there is no benefit Advice 11/28/2018 Gio ICEIS 21

An Ontology Algebra A knowledge-based algebra for ontologies The Articulation Ontology (AO) consists of matching rules that link domain ontologies Intersection create a subset ontology keep sharable entries Union create a joint ontology merge entries Difference create a distinct ontology remove shared entries 11/28/2018 Gio ICEIS 22

Sample Operation: INTERSECTION Result contains shared terms Terms useful for purchasing Source Domain 1: Owned and maintained by Store Source Domain 2: Owned and maintained by Factory 11/28/2018 Gio ICEIS 23

color =table(colcode) Sample Intersections Articulation ontology matching rules : size = size color =table(colcode) style = style Ana- tomy Material inventory {...} Employees { . . . } Machinery { . . . } Processes { . . . } Shoes { . . . } Shoe Factory Shoe Store Shoes { . . . } Customers { . . . } Employees { . . . } {. . . } Hard- ware foot = foot Employees Employees Nail (toe, foot) Nail (fastener) . . . . . . Department Store 11/28/2018 Gio ICEIS 24

Other Basic Operations UNION: merging entire ontologies DIFFERENCE: material fully under local control Arti- culation ontology typically prior intersections 11/28/2018 Gio ICEIS 25

Features of an algebra The record of past operations can be Operations can be composed Operations can be rearranged Alternate arrangements can be evaluated Optimization is enabled The record of past operations can be kept and reused (experience: 3 months  1 week for Webster's annual update,  2 weeks for OED (6 x size [Jannink:01] ) 11/28/2018 Gio ICEIS 26

INTERSECTION support Articulation ontology Store Ontology Matching rules that use terms from the 2 source domains Factory Terms useful for purchasing 11/28/2018 Gio ICEIS 27

Primitive Operations Model and Instance ... Constructors create object create set Connectors match object match set Editors insert value edit value move value delete value Converters object - value object indirection reference indirection Unary Summarize -- structure up Glossarize - list terms Filter - reduce instances Extract - circumscription Binary Match - data corrobaration Difference - distance measure Intersect - schem discovery Blend - schema extension ... 11/28/2018 Gio ICEIS 28

Matching rules (in process) A equals U A equals {u, … } A equals Union (U, V) A equals Union (U, {v, ...}) A equals Intersection (U, W) A equals Intersection (U, {w, ...}) A equals Difference (U, X) A equals Difference (U, {x, ...}) must obey algebraic properties 11/28/2018 Gio ICEIS 29

Sample Processing in HPKB What is the most recent year an OPEC member nation was on the UN security council (SC)? (An DARPA HPKB Challenge Problem) SKC resolves 3 Sources CIA Factbook ‘96 (nations) OPEC (members, dates) UN (SC members, years) SKC obtains the Correct Answer 1996 (Indonesia) Other groups obtained more, but factually wrong answers; they relied on one global source, the CIA factbook. Problems resolved by SKC Factbook – a secondary source -- has out of date OPEC & UN SC lists Indonesia not listed Gabon (left OPEC 1994) different country names Gambia => The Gambia historical country names Yugoslavia UN lists future security council members Gabon 1999 needed ancillary data 11/28/2018 Gio ICEIS 30

Interoperation via Articulation At application definition time Match relevant ontologies where needed Establish articulation rules among them. Record the process At execution time Perform query rewriting to get to sources Optimize based on the ontology algebra. For maintenance Regenerate rules using the stored process formulation 11/28/2018 Gio ICEIS 31

Generation of the rules Provide library of automatic match heuristics Lexical Methods -- spelling Structural Methods -- relative graph position Reasoning-based Methods Nexus  Hybrid Methods Iteratively, with an expert in control GUI tool to - display matches and - verify generated matches using the human expert - expert can also supply matching rules 11/28/2018 Gio ICEIS 32

Articulation Generator Being built by Prasenjit Mitra Thesaurus OntA Context-based Word Relator Phrase Relator Driver Semantic Network (Nexus) Structural Matcher Ont1 Ont2 Human Expert 11/28/2018 Gio ICEIS 33

Lexical Methods Preprocessing rules. - Expert-generated seed rules. e.g., (Match O1.President O2.PrimeMinister) - Context-based preprocessing directives. Thesaurus - synonyms, generalizations yellow  ochre, canary Nexus – term relationship graph Owner = buyer ( Distance of words as measure of relatedness ) 11/28/2018 Gio ICEIS 34

Tools to create articulations Graph matcher for Articulation- creating Expert Vehicle ontology Transport ontology Suggestions for articulations 11/28/2018 Gio ICEIS 35

continue from initial point Also suggest similar terms for further articulation: by spelling similarity, by graph position by term match repository Expert response: 1. Okay 2. False 3. Irrelevant to this articulation All results are recorded Okay’s are converted into articulation rules 11/28/2018 Gio ICEIS 36

Candidate Match Nexus * free, we also have an OED-based nexus. Term linkages automatically extracted from 1912 Webster’s dictionary * * free, we also have an OED-based nexus. Based on processing headwords ý definitions using algebra primitives Notice presence of 2 domains: chemistry, transport 11/28/2018 Gio ICEIS 37

Using the nexus 11/28/2018 Gio ICEIS 38

Navigating the match repository 11/28/2018 Gio ICEIS 39

Example: NATO Country Graphs Austria: bundestag .... 11/28/2018 Gio ICEIS 40

To be matched to 70% of documented matches found automatically Great Britain parliament .... 70% of documented matches found automatically Remainder required human interaction with our tools. 11/28/2018 Gio ICEIS 41

Broader Applications Compose Composed ontology for applications using A,B,C,E Articulation ontology for U (A B) (B C) (C E) Articulation ontology Legend: U : union U : intersection U (C E) Articulation ontology for ontology for resource E Ontology for C (A B) U U (B C) U (C D) Ontology for resource A Ontology for resource B Ontology for resource D 11/28/2018 Gio ICEIS 42

Exploiting the result Result has links to source Future work Result has links to source Processing & query evaluation is best performed within Source Domains & by their engines 11/28/2018 Gio ICEIS 43

Architectural development time W2 W1 D2 D6 D4 W3 I1 D1 D5 I2 M1 M2 A1 A4 A5 A2 A6 b. A3 datasources wrappers mediators Network middleware integrators applications D3 11/28/2018 Gio ICEIS 44

Transform Data to Information data and simulation resources value-added services decision-makers at workstations Application Layer Mediation Layer Foundation Layer 11/28/2018 Gio ICEIS 45

Mediation and maintenance Application Interface Changes of user needs Domain ontology changes Software & People Owner / Creator Maintainer Lessor - Seller Advertisor Models, programs, rules, caches, . . . Resource Interfaces Resource functional & ontological changes Tools can help, but changes have to be dealt with rapidly. Automated learning is typically too slow, requires many instances 11/28/2018 Gio ICEIS 46

Domain Specialization Knowledge Acquisition (20% effort) & Knowledge Maintenance (80% effort *) to be performed by Domain specialists Professional organizations Field teams of modest size autonomously maintainable Empowerment * based on experience with software 11/28/2018 Gio ICEIS 47

SKC Synopsis Research Objective: Precise answers from heterogeneous, imperfect, scalably many data sources Sources for Ontologies: General: CIA World Factbook ‘96, UN-www, OPEC-www Webster’s Dictionary, Thesaurus, Oxford English Dictionary Topical: NATO, BattleSpace Sensors, Logistics Servers Theory: Rule-based algebra over ontologies Translation & Composition primitives Sponsor and collaboration AFOSR; DARPA DAML program; W3C; Stanford KSL and SMI; Univ. of Karlsruhe, Germany; others. 11/28/2018 Gio ICEIS 48

Innovation in SKC No need to harmonize full ontologies Focus on what is critical for interoperation Rules specific for articulation Tools for creation and maintenance Maintenance is distributed to n sources to m articulation agents in mediators Potentially many sets of articulation rules is m < n2 , depends on semantic architecture density a research question: density 11/28/2018 Gio ICEIS 49

Conclusion Today solved by human intermediate experts High precision is important for enterprise applications cost of overload versus opportunity loss Semantic differences cause problems Today solved by human intermediate experts Will need automation support Tools so that expert knowledge is captured and maintainable Scalability requires a thorough foundation Algebra provides composition, formal basis, delegation Formal composition supports maintenance Delegation of responsibility and authority enhances quality Many research tasks left 11/28/2018 Gio ICEIS 50

Long Range Science Vision Artificial Intelligence knowledge mgmt domain expertise uncertainty Databases access storage algebras Systems Engineering analysis documentation costing . . . . . . Integration Science Gio ICEIS 51