Page 1 + Semantic Interoperability – Yes! Presentation to the CIO Council June 18th 2007 Lucian Russell, Ph.D.

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

1 Knowledge and reasoning – second part Knowledge representation Logic and representation Propositional (Boolean) logic Normal forms Inference in propositional.
Chapter 1: The Database Environment
Chapter 7 System Models.
Copyright © 2003 Pearson Education, Inc. Slide 8-1 Created by Cheryl M. Hughes, Harvard University Extension School Cambridge, MA The Web Wizards Guide.
Chapter 7 Constructors and Other Tools. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-2 Learning Objectives Constructors Definitions.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
BASIC SKILLS AND TOOLS USING ACCESS
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
1 Hyades Command Routing Message flow and data translation.
18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
8 Copyright © 2005, Oracle. All rights reserved. Creating the Web Tier: JavaServer Pages.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Conversion Problems 3.3.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Limitations of the relational model 1. 2 Overview application areas for which the relational model is inadequate - reasons drawbacks of relational DBMSs.
Programming Language Concepts
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,
Week 2 The Object-Oriented Approach to Requirements
Configuration management
Chapter 11: Models of Computation
Campaign Overview Mailers Mailing Lists
Database Design Process
© Paradigm Publishing, Inc Access 2010 Level 1 Unit 1Creating Tables and Queries Chapter 2Creating Relationships between Tables.
Yong Choi School of Business CSU, Bakersfield
Chapter 3 Basic Logic Gates 1.
Chapter 6 Data Design.
Access Tables 1. Creating a Table Design View Define each field and its properties Data Sheet View Essentially spreadsheet Enter fields You must go to.
Legacy Systems Older software systems that remain vital to an organisation.
Copyright © 2013, 2009, 2005 Pearson Education, Inc.
XML and Databases Exercise Session 3 (courtesy of Ghislain Fourny/ETH)
R12 Assets A Look Inside SM. Copyright © 2008 Chi-Star Technology SM -2- High-Level Overview R12 Setups –Subledger Accounting –ADI Templates –XML Reports.
Benchmark Series Microsoft Excel 2013 Level 2
A Process to Identify the Enduring Skills, Processes, & Concepts for your Content Area 1.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Science as a Process Chapter 1 Section 2.
Who are the Experts?Simon KampaSlide 1 Who are the Experts? Simon Kampa IAM Group University of Southampton
Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques
Analyzing Genes and Genomes
Systems Analysis and Design in a Changing World, Fifth Edition
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 12 View Design and Integration.
Essential Cell Biology
Intracellular Compartments and Transport
PSSA Preparation.
Chapter 11 Describing Process Specifications and Structured Decisions
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Energy Generation in Mitochondria and Chlorplasts
© Paradigm Publishing, Inc Access 2010 Level 2 Unit 2Advanced Reports, Access Tools, and Customizing Access Chapter 8Integrating Access Data.
WEB OF KNOWLEDGE 5.2
Know About E-CTLT Teachers Panel and working area.
Profile. 1.Open an Internet web browser and type into the web browser address bar. 2.You will see a web page similar to the one on.
© Copyright 2011 John Wiley & Sons, Inc.
Context of White Paper 3 The Data Reference Model (DRM) Version 2.0 had three components, Data Description, Data Context and Data Sharing It pushed details.
Page 1 + New Directions In Semantic Interoperability Lucian Russell, PhD Expert Reasoning & Decisions LLC SICoP Special Conference 2 Building Knowledgebases.
+ Page 1 Building The DRM 3.0 – and SOAs and the Web 3.0 Too! Can We Start Now? Lucian Russell, PhD SICoP Special Conference February 6 th, 2007.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Presentation transcript:

Page 1 + Semantic Interoperability – Yes! Presentation to the CIO Council June 18th 2007 Lucian Russell, Ph.D.

+ Page 2 Semantic Interoperability What is it? The Data Reference Model Version 2.0 states: –3.2. Introduction – What is Data Description and Why is it Important … –Semantic Interoperability: Implementing information sharing infrastructures between discrete content owners (even with using service-oriented architectures or business process modeling approaches) still has to contend with problems with different contexts and their associated meanings. Semantic interoperability is a capability that enables enhanced automated discovery and usage of data due to the enhanced meaning (semantics) that are provided for data. Semantic Interoperability is a condition that is created with respect to a Data Resource that is under the control of an Agency. –Associated with each Data Resource is another that allows a reasoning service to identify its semantics to determine its value w.r.t. a query –Left in the table: how are reasoning services created and what are the necessary additional data resources needed?

+ Page 3 In 2005 there was no direct answer, only a template See DRM Version 2.0 Chapter 2, Figure 2.5 –Digital Data Resources can be Structured, Semi-Structured or Unstructured and can be contained within a document. These can describe a Data Asset. –On the other hand a Data Asset can provide a management context for a Digital Data Resources. –Topics in a language can categorize either (i.e. they are instances of a class designated by the topic word.) To support enhanced automated discovery, though we need to use a combination and constellation of some collection of instances of these three entities. Interoperability would then depend on the adequacy of the combination. In 2005 the way was unclear, but there was a template. On Page 18 …Data Description artifacts are an output of the process of providing data syntax and semantics and a meaningful identification for a data resource so as to make it visible and usable by a COI. The most effective government COI is the Global Change Master Directory The GCMD indexes 18 petabytes of multi-agency data: it was the template

+ Page 4 In 2006 there were several breakthroughs The unclassified R&D sponsored by the Intelligence Community had several important breakthroughs that impact enhanced discovery services –AQUAINT – Advanced Question Answering for Intelligence WordNet was enhanced to create a disambiguated description of the most common words in the English Language, some 115,000 words and their meanings. A markup language for time, TIMEML An extraction technique to parse English Language text and create logical relations –NIMD – Novel Intelligence from Massive Data (FOUO) Released (not-FOUO) the slides announced a breakthrough from the IKRIS project Interoperable Knowledge Representation for Intelligence Support (IKRIS) The IRIS Projects Challenge: –How to enable interoperability of knowledge representation and reasoning (KR&R) technology developed by multiple organizations in multiple DTO programs and designed to perform different tasks The Results: –A new language IKL that translates among knowledge representation languages –An extension of logic to 2 nd order and non-monotonic expressions –A proof of equivalence among process specifications

+ Page 5 These results open the way to SI using - English! The implication of the results are staggering – English Descriptions in documents can be used to enable enhanced automated discovery. There were limitations on concepts that could be represented: –Prior semantic technology (e.g. OWL-DL) only allowed for precise descriptions of concepts represented by nouns, i.e. taxonomies. Ontologies were defined as overlapping taxonomies. –WordNet now allows nouns to be unambiguously described. –WordNet has clearly demonstrated that nouns have single-subtype taxonomies but verbs do not: because there is a time element in all verbs meanings they have four sub-classes (verbs describe 4-D motions or state changes). –Consequently nouns and verbs cannot be intermixed meaningfully (without inconsistency) in OWL-DL Ontologies. –Representing concepts using verbs entails describing processes, which are multiple verbs in a Part-of (meronymic/holonymic) relationships. –English descriptions of processes were imprecise because relative time concepts were heretofore too poorly understood to support automation. With WordNet and TIMEML we can now precisely describe the processes that create and change data as well as the nouns used for the real world.

+ Page 6 TimeML Markup Language for Temporal and Event Expressions TimeML is a robust specification language for events and temporal expressions in natural language. It is designed to address four problems in event and temporal expression markup: –(1) Time stamping of events (identifying an event and anchoring it in time); –(2) Ordering events with respect to one another (lexical versus discourse properties of ordering); –(3) Reasoning with contextually underspecified temporal expressions (temporal functions such as 'last week' and 'two weeks before'); –(4) Reasoning about the persistence of events (how long does an event or the outcome of an event last). The rules that identify temporal dependencies can be used to insert tags into text. These can be processed. Processes that entail other sub-processes can also be processed logically, i.e. infer from A filed an application the fact that A filled out an application. Language Computer Corporation (AQUANT) finds logical relations in text

+ Page 7 LCC Product Polaris Semantic Relations #Semantic RelationAbbr 1POSSESSIONPOS 2KINSHIPKIN 3PROPERTY-ATTRIBUTE HOLDERPAH 4AGENTAGT 5TEMPORALTMP 6DEPICTIONDPC 7PART-WHOLEPW 8HYPONYMYISA 9ENTAILENT 10CAUSECAU 11MAKE-PRODUCEMAK 12INSTRUMENTINS 13LOCATION-SPACELOC 14PURPOSEPRP 15SOURCE-FROMSRC 16TOPICTPC 17MANNERMNR 18MEANSMNS 19ACCOMPANIMENT-COMPANIONACC 20EXPERIENCEREXP #Semantic RelationAbbr 21RECIPIENTREC 22FREQUENCYFRQ 23INFLUENCEIFL 24ASSOCIATED-WITH / OTHEROTH 25MEASUREMEA 26SYNONYMY-NAMESYN 27ANTONYMYANT 28PROBABILITY-OF-EXISTENCEPRB 29POSSIBILITYPSB 30CERTAINTYCRT 31THEME-PATIENTTHM 32RESULTRSL 33STIMULUSSTI 34EXTENTEXT 35PREDICATEPRD 36BELIEFBLF 37GOALGOL 38MEANINGMNG 39JUSTIFICATIONJST 40EXPLANATIONEXN

+ Page 8 LCCs Jaguar product can automatically generate ontologies and structured knowledge bases from text –Ontologies form the framework or skeleton of the knowledge base –Rich set of semantic relations form the muscle that connects concepts in the knowledge base IS-A carry AGENT conduct THEME board AGENT THEME board MEANS ship transport MEANS AGENT arrive run stop Joined train passenger train freight train LCCs Jaguar: Knowledge Extraction

+ Page 9 It is now Cost Effective to Document Databases! Previously documentation of databases was a black hole for budget $$ –Only people would read the documentation –It was never kept up to date –Rules within it evolved over time –Hence people never read the documentation anyway and the data was inconsistent –ETL techniques, Data warehouses and Data Marts were used to get uniformity, but substituting computer generated data for stored data is no guarantee of accuracy. Now text descriptions of databases can be processed automatically –The correct WordNet sense of each word can be used. A correct description of the relationships among data attributes and the processes that describe how they were created can now be used for semantic processing. –The text can be extracted and used to create knowledge repositories! AQUAINT and NIMD also enhanced the CYC Knowledge Base –CYCORP has the worlds largest general ontology and knowledge base describing the real world. It can be extended and used for Interoperability.

+ Page 10

+ Page 11 How can this be done? Carefully! Real World Data Mathematical Patterns Social World Data Data about Individuals Data are samples Data are State Changes Data are Both Old fashioned 1970s Data Modeling destroys distinctions: Lost Gold! Gray mass of sameness

+ Page 12 Look at each type of data and how it comes into being! Example: A USCIS form has 10 Object types Photograph Signature Fingerprints 1: Data Elements: Name & Country of Citizenship 2: Data Elements: Identification Numbers 3: Data Elements: Residence History 6: Data Elements: Arrivals & Departures 7: Data Elements: Arrests & Citations 8: Data Elements: Marital Information 9: Data Elements: Childrens Names 10: Data Elements: Parents Country of Citizenship 5: Data Elements: Employment History 4: Data Elements: Education History

+ Page 13 Structured Data and Schema Mismatch Syntactic Schema Mismatch: –IEEE Computer December 1991 showed a large number of syntactic mismatches among representations of data were a barrier to data integration or sharing. Entities = Attributes = Data Values – Nonsense or Computer Science? Computer Science: Semantic Schema Mismatches –In 1986 it was published in Computing Surveys that when looking at how to integrate databases we could see that one Databases Entity was another Databases Attribute –In 1991 a research result showed that an Attribute in one Database could be a Data Value in another Database So, with a potential for this degree of mismatch sending XML schemas to a repository is not necessarily a help to semantic interoperability. The field of database integration essentially went dead in 1991 HOWEVER, another side effect of IKRIS is that it is now possible to detect semantic similarities among databases even when there are different representations of the data as entity, attribute and data values – it wont be perfect but it will be a lot better than we have. Additional work is starting on using ANSI Data Dictionary structures and populating them automatically.

+ Page 14 In Conclusion It is possible to increase Data Sharing in the government To enable enhanced automated discovery –Start with the Global Change Master Directory as a template and expand –Create new data descriptions –Use the English language correctly –Build process descriptions that show how and when data was generated –Use advanced Linguistic tools to extract data relationships –Integrate with a general knowledge base To overcome Schema Mismatch –Revisit old data models and carefully expand existing definitions to show the full semantics of the data schema –Keep in mind that in the Real World one collects data samples of continuous processes whereas the Social World records state changes. Individuals data combines both. There is no easy solution but advanced tools ensure hat any effort spent today is re-usable tomorrow and so there is no loss of value for investments in improving data descriptions.