Information Artifact Ontology: General Background Barry Smith 1.

Slides:



Advertisements
Similar presentations
Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR.
Advertisements

Species-Neutral vs. Multi-Species Ontologies Barry Smith.
On the Future of the NeuroBehavior Ontology and Its Relation to the Mental Functioning Ontology Barry Smith
Goal and Status of the OBO Foundry Barry Smith. 2 Semantic Web, Moby, wikis, crowd sourcing, NLP, etc.  let a million flowers (and weeds) bloom  to.
Basic guidelines for the creation of a DW Create corporate sponsors and plan thoroughly Determine a scalable architectural framework for the DW Identify.
Information and Business Work
1 Introduction to Biomedical Ontology Barry Smith University at Buffalo
R R R CSE870: Advanced Software Engineering (Cheng): Intro to Software Engineering1 Advanced Software Engineering Dr. Cheng Overview of Software Engineering.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
The Problem of Reusability of Biomedical Data OBO Foundry & HL7 RIM Barry Smith.
Room for Lunch: Arlington Room Room for Evening Reception: Grand Prairie Room.
How to Organize the World of Ontologies Barry Smith 1.
New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical Ontology in Buffalo Part I: The Gene Ontology Barry Smith and Werner Ceusters.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Software Engineering Muhammad Fahad Khan
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
SC32 WG2 Metadata Standards Tutorial Metadata Registries and Big Data WG2 N1945 June 9, 2014 Beijing, China.
Tne Role of Ontologies in Military Collaboration Barry Smith 1.
Describing Methodologies PART II Rapid Application Development*
Limning the CTS Ontology Landscape Barry Smith 1.
Developing an OWL-DL Ontology for Research and Care of Intracranial Aneurysms – Challenges and Limitations Holger Stenzhorn, Martin Boeker, Stefan Schulz,
Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design.
Ontology of Sensors: Some Examples from Biology
Ontological realism as a strategy for integrating ontologies Ontology Summit February 7, 2013 Barry Smith 1.
©Ian Sommerville 2000 Software Engineering, 6th edition. Slide 1 Component-based development l Building software from reusable components l Objectives.
11:00 Self-Introductions 11:15 Report on ontology-based data integration work in DCGS-A --- Goals and methodology --- Practical experience and results.
Imports, MIREOT Contributors: Carlo Torniai, Melanie Courtot, Chris Mungall, Allen Xiang.
High Level Architecture Overview and Rules Thanks to: Dr. Judith Dahmann, and others from: Defense Modeling and Simulation Office phone: (703)
Army Net-Centric Data Strategy Center Of Excellence (ANCDS) Army Data Harmonization and Integration Working Group (ADHIWG) Sever Ciorlian ANCDS Team Lead.
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
Model-Driven Analysis Frameworks for Embedded Systems George Edwards USC Center for Systems and Software Engineering
Horizontal Integration of Warfighter Intelligence Data A Shared Semantic Resource for the Intelligence Community Barry Smith, University at Buffalo, NY,
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
Electronic Scriptorium, Ltd. AIIM Minnesota Chapter Metadata and Taxonomy Presentation Copyright Electronic Scriptorium, Ltd. All rights reserved, 1991.
Alan Ruttenberg PONS R&D Task force Alan Ruttenberg Science Commons.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Towards an Ontology of Military Plans and Planning Barry Smith National Center for Ontological Research, Buffalo.
PDE3 – Frameworks for interoperability of Product Data in SME based environment Lecturer: Ricardo Gonçalves.
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
How to integrate data Barry Smith. The problem: many, many silos DoD spends more than $6B annually developing a portfolio of more than 2,000 business.
Barry Smith August 26, 2013 Ontology: A Basic Introduction 1.
Semantic Enhancement vs. Integration Data-Model DSC Solution
2 3 where in the body ? where in the cell ?
Ontology and the Semantic Web Barry Smith August 26,
Need for common standard upper ontology
Introduction to Biomedical Ontology for Imaging Informatics Barry Smith, PhD, FACMI University at Buffalo May 11, 2015.
1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo
OBO Foundry Principles BFO RO Barry Smith 1. OBO Foundry Principles  open  common formal language (OBO Format, OWL DL, CL)  commitment to collaboration.
Big Data that might benefit from ontology technology, but why this usually fails Barry Smith National Center for Ontological Research 1.
Basic Formal Ontology Barry Smith August 26, 2013.
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
+ Informatics 122 Software Design II Lecture 13 Emily Navarro Duplication of course material for any commercial purpose without the explicit written permission.
Of 24 lecture 11: ontology – mediation, merging & aligning.
New York State Center of Excellence in Bioinformatics & Life Sciences R T U New York State Center of Excellence in Bioinformatics & Life Sciences R T U.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
New York State Center of Excellence in Bioinformatics & Life Sciences R T U New York State Center of Excellence in Bioinformatics & Life Sciences R T U.
Advanced Software Engineering Dr. Cheng
Development of the Amphibian Anatomical Ontology
The Systems Engineering Context
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Model-Driven Analysis Frameworks for Embedded Systems
Tools of Software Development
Why do we need upper ontologies? What are their purported benefits?
CSSSPEC6 SOFTWARE DEVELOPMENT WITH QUALITY ASSURANCE
Ontology-Based Approaches to Data Integration
Introduction to Systems Analysis and Design Stefano Moshi Memorial University College System Analysis & Design BIT
OBO Foundry Update: April 2010
Presentation transcript:

Information Artifact Ontology: General Background Barry Smith 1

Slides 2

Barry Smith – who am I? Director: National Center for Ontological Research (Buffalo) Founder: Ontology for the Intelligence Community (OIC, now STIDS) conference series Ontology work for NextGen (Next Generation) Air Transportation System National Nuclear Security Administration, DoE Joint-Forces Command Joint Warfighting Center Army Net-Centric Data Strategy Center of Excellence Army Intelligence and Information Warfare Directorate (I2WD) and for many national and international biomedical research and healthcare agencies 3

I 2 WD Ontology Team Ron Rudnucki CUBRC, University at Buffalo Dr. Tatiana Malyuta NY City College of Technology of CUNY, Data Tactics Corp. David Salmen Data Tactics Corp. LCOL Dr. William Mandrick Data Tactics Corp. 4

In the olden days people measured lengths using inches, ulnas, perches, king’s feet, Swiss feet, leagues of Paris, etc., etc. 5

On June 22, 1799, in Paris, everything changed 6

International System of Units (SI) 7

Making data (re-)usable through standard terminologies Standards provide – common structure and terminology – single data source for review (less redundant data) Standards allow – use of common tools and techniques – common training – single validation of data 8

One successful part of the solution to this problem = Ontologies controlled vocabularies (nomenclatures) plus definitions of terms in a logical language Standardized (logically defined) terms in an ontology are the equivalent of standardized units in the SI 9

Ontologies are computer-tractable representations of types in specific areas of reality are more and less general (upper and lower ontologies) – upper = organizing ontologies – lower = domain ontology modules 10

Linked Open Data are not enough 11

Links are inconsistently defined; ontologies are full of redundancies 12

Towards coordination of modular non-redundant ontologies 13

RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) The Open Biomedical Ontologies (OBO) Foundry 14

RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population (PCO) Organ Function (FMP, CPRO) Population Phenotype Population Process ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Population-level ontologies 15

RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Environment Ontology (EnvO) Environments 16

OBO Foundry approach extended into other domains 17 NIF StandardNeuroscience Information Framework IDO ConsortiumInfectious Disease Ontology cROPCommon Reference Ontologies for Plants MilPortal.orgMilitary Ontology AIRS Ontology SuiteIntelligence Ontology Suite

18

19

20

slide from Margaret Storey 21

Horizontal Integration of Big Intelligence Data The Role of Ontology in the Era of Big Data T. Malyuta, Ph. D New York City College of Technology, NY, NY B. Smith, Ph. D University at Buffalo, Buffalo, NY R. Rudnicki CUBRC, Buffalo, NY

23 index.php/Main_Page#Documents

Big Data Problem Wikipedia defines Big Data as “…a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools.” Gartner defines Big Data with three ‘V’s: – Volume – Velocity (of production and analysis) – Variety – Recently the forth ‘V’ – Veracity – was added This means that Big Data are beyond our control (as opposed to those complex and big systems with diverse and changing data where the complexity is known) 24

Big Data Solution – Agility Dimensions of agility – Storage paradigms that accommodate massive volumes of heterogeneous data – Data processing paradigms that can deal with the massive volumes of heterogeneous data coming onstream – Dynamic data stores that can easily accommodate diverse and a priori unknown data types and semantics – Methods and tools that leverage dynamic and diverse content 25

Agile Data Management New highly distributed computing (MapReduce) and data processing (Bigtable) paradigms and technologies based on them (hadoop.apache.org/, hbase.apache.org/) help in solving data management problems:MapReduceBigtablehadoop.apache.org/ hbase.apache.org/ – Store and process Volume – Keep up with Velocity – Represent Variety These technologies are not meant (and never were meant) to provide data interpretation – In data systems we have been dealing with data the meaning of which we knew (usually via data applications) – These technologies do not help in solving the problem of data integration and interoperability of systems 26

Agile Utilization Today, the main problem of the Big Data is how to use it – Utilization of ‘Variety’ – diverse and a priori unknown types and semantics – Ability of Big Data systems to interoperate – Ability to integrate Big Data – The last two problems are inherently difficult and could not be properly addressed by the data technology itself Traditional data utilization and integration approaches fail Relying on legacy data models and mappings (linked open data) fails – creates forking and mapping degradation Agile utilization and integration paradigms are needed 27

The Problem of Horizontal Integration of Big Intelligence Data HI =Def. the ability to exploit multiple data sources as if they are one Recognized issues for HI with existing approaches – Data silos – Lexicon/semantics silos Requirement for HI of Big Intelligence Data – Agile Semantic Interoperability  A strategy for HI must be agile in the sense that it can be quickly extended to new zones of emerging data according to need  Ontology allows an incremental approach – big bang already from the very first buck (we showed on the project that is described below)  Ontology can provide the needed agility 28

Agile Semantic Interoperability A good solution has to be – Able to grow incrementally – Able to be developed in a distributed manner – Without losing consistency – Independent of particular implementations, and data producers and consumers – Applicable to data in an agile manner We call our solution: ‘semantic enhancement’ (SE) of data 29

Explica­tion of general terms used in source intelligence artifacts and in data models, terminologies and doctrinal publications which provide typo­logies of intelligence- related IAs to semantically enhance data in a way that enables computational integration and reasoning Annotation of the instance-level information captured by such IAs to aid retrieval of information about specific persons, groups, events, documents, images, and so forth Explication vs. Annotation

SE Types Explica­tion of general terms used in source intelligence artifacts and in data models, terminologies and doctrinal publications which provide typo­logies of intelligence-related IAs to semantically enhance data in a way that enables computational integration and reasoning Annotation of the instance-level information captured by such IAs to aid retrieval of information about specific persons, groups, events, documents, images, and so forth

SE SE is realized with the help of ontologies that are used to explicate data models and annotate data instances – Vocabulary of ontologies used for explications and annotations provides agile horizontal integration – Ontologies, by virtue of their nature and organization, provide semantic enhancement of data PersonIDNameDescription 111JavaProgramming 222SQLDatabase SQLJavaC++ ProgrammingSkill ComputerSkill Skill Education Technical Education 32

The Meaning of ‘Enhancement’ Semantic enhancement/enrichment of data = arm’s length approach (no change to data) – through simple explication we associate an entire knowledge system with a database field – enables analytics to process data, e.g. about computer skills, “vertically” along the Skill hierarchy, as well as “horizontally” via relations between Skill and Education. – and further… while data in the database does not change, its analysis can be richer and richer as our understanding of the reality changes For this richness to be leveraged by different communities, persons, and applications it needs to have the properties mentioned above and be constructed in accordance with the principles of the SE 33

SE Principles ⁻Create a Shared Semantic Resource (SSR) of ontologies to be used for explication and annotation ⁻Establish an agile strategy for building ontologies within this SSR, and apply and extend these ontologies to explicate and annotate new source data as they come onstream ⁻Problem: Given the immense and growing variety of data sources, the development methodology must be applied by multiple different groups ⁻How to manage collaboration? 34

Achieving the Goal Methodology of incremental distributed ontology development A common ontology architecture incorporating a common, domain-neutral, upper-level ontology (BFO) A shared governance and change management process A simple, repeatable process for ontology development An ontology registry A process of intelligence data capture through explication or source data models 35

Main Methodological Points Ontological realism – Based on Doctrine / Science – Involves SMEs in label selection and definition – Thoroughly tested in many projects Arms-length process, with minimal disturbance to existing data and data semantics Reference ontologies – capture generic content and are designed for aggressive reuse in multiple different types of context: Single reference ontology for each domain of interest Application ontologies – are tied to specific local applications – An application ontology is created by combining local content with generic content taken from relevant reference ontologies – Still interoperable because based on common set of reference ontologies * Barry Smith and Werner Ceusters, “Ontological Realism as a Methodology for Coordinated Evolution of Scientific Ontologies”, Applied Ontology, 5 (2010), 139–

Arms-length Process SE ontology labels Focusing on the terms (labels, acronyms, codes) used in ***our source data Where multiple distinct terms {t 1, …, t n } are used in separate data sources with one and the same meaning, they are associated with a single preferred label drawn from a standard set of such labels All the separate data items associated with the {t 1, … t n } thereby linked together through the corresponding preferred labels. Preferred labels form basis the for the ontologies we build Heterogeneous Contents ABC KLM XYZ 37

Reference and Application Ontologies vehicle =def: an object used for transporting people or goods tractor =def: a vehicle that is used for towing crane =def: a vehicle that is used for lifting and moving heavy objects vehicle platform=def: means of providing mobility to a vehicle wheeled platform=def: a vehicle platform that provides mobility through the use of wheels tracked platform=def: a vehicle platform that provides mobility through the use of continuous tracks artillery vehicle = def. vehicle designed for the transport of one or more artillery weapons wheeled tractor = def. a tractor that has a wheeled platform tracked tractor = def. a tractor that has a tracked platform artillery tractor = def. an artillery vehicle that is a tractor wheeled artillery tractor = def. an artillery tractor that has a wheeled platform Reference Ontology Application Definitions 38

Illustration of Ontology Types ( Toy Example ) Vehicle Tractor Wheeled Tractor Artillery Tractor Wheeled Artillery Tractor Artillery Vehicle Black – reference ontologies Red – application ontologies 39

Role of Reference Ontologies Normalized – Maintains a set of consistent ontologies – Eliminates redundancy Modular – A set of plug-and-play ontology modules – Enables distributed consistent development Surveyable 40

SE Architecture The Upper Level Ontology (ULO) in the SE hierarchy must be maximally general (no overlap with domain ontologies) The Mid-Level Ontologies (MLOs) introduce successively less general and more detailed representations of types which arise in successively narrower domains until we reach the Lowest Level Ontologies (LLOs). The LLOs are maximally specific representation of the entities in a particular one-dimensional domain 41

Architecture Illustration 42

Challenges to HI Too many lexicons The scope of the domain: signal, sensor, image, … intelligence about … the whole world Difficult to conduct governance and management of ontology development to ensure consistent evolution Lack of expertise Complexity of the ontology development and application process 43

Preventing Failure The method we use offers solutions to some of the common reasons for failure Lack of Consensus – Realism offers an objective standard for settling disputes over terminology. Ontology development becomes an empirical science instead of an exercise in the publication of dialects – Governance helps to resolve conflicts and achieve consensus High Maintenance – Arm’s length implementation places no additional overhead onto applications Parochialism – Architecture and methodology prevent development of vocabularies that apply only to a single perspective Poor Quality – Experience prevents common mistakes in vocabularies that cause downstream problems with search and analytics 44

Preventing Failure (cont.) Agile ontology development – Methodology and architecture – Growing SSR Agile ontology application – Incremental – Semi-automated where possible – Even if not as fast as some want it to be It is still faster than creating a physical store, which will be just another silo and will still need to be integrated with the rest of data Once a data collection is semantically enhanced, it is integrated with all data that had been and will be semantically enhanced without any additional efforts 45

What is Next… – IAO-Intel: An Information Artifact Ontology for the Intelligence Community (BS) – A Survey of DSGS-A Ontology Work and Explicating and Annotating Processes (R. Rudnicki) – Ontology – illustration of the methodology of ontology design and of the IAO-Intel (D. Salmen and W. Mandrick) 46

References Barry Smith, Tatiana Malyuta, William S. Mandrick, Chia Fu, Kesny Parent, Milan Patel, Horizontal Integration of Warfighter Intelligence Data: A Shared Semantic Resource for the Intelligence Community, STIDS Conference, 2012.Horizontal Integration of Warfighter Intelligence Data: A Shared Semantic Resource for the Intelligence Community Barry Smith, Tatiana Malyuta, David Salmen, William Mandrick, Kesny Parent, Shouvik Bardhan, Jamie Johnson, “Ontology for the Intelligence Analyst”, Crosstalk: The Journal of Defense Software Engineering, David Salmen, Tatiana Malyuta, Alan Hansen, Shaun Cronen, Barry Smith, Integration of Intelligence Data through Semantic Enhancement, STIDS Conference, 2011.Integration of Intelligence Data through Semantic Enhancement 47