Download presentation
Presentation is loading. Please wait.
Published byAbel Gaines Modified over 9 years ago
1
Information Artifact Ontology: General Background Barry Smith 1
2
Slides http://ncorwiki.buffalo.edu/index.php/STIDS_2013 2
3
Barry Smith – who am I? Director: National Center for Ontological Research (Buffalo) Founder: Ontology for the Intelligence Community (OIC, now STIDS) conference series Ontology work for NextGen (Next Generation) Air Transportation System National Nuclear Security Administration, DoE Joint-Forces Command Joint Warfighting Center Army Net-Centric Data Strategy Center of Excellence Army Intelligence and Information Warfare Directorate (I2WD) and for many national and international biomedical research and healthcare agencies 3
4
I 2 WD Ontology Team Ron Rudnucki CUBRC, University at Buffalo Dr. Tatiana Malyuta NY City College of Technology of CUNY, Data Tactics Corp. David Salmen Data Tactics Corp. LCOL Dr. William Mandrick Data Tactics Corp. 4
5
In the olden days people measured lengths using inches, ulnas, perches, king’s feet, Swiss feet, leagues of Paris, etc., etc. 5
6
On June 22, 1799, in Paris, everything changed 6
7
International System of Units (SI) 7
8
Making data (re-)usable through standard terminologies Standards provide – common structure and terminology – single data source for review (less redundant data) Standards allow – use of common tools and techniques – common training – single validation of data 8
9
One successful part of the solution to this problem = Ontologies controlled vocabularies (nomenclatures) plus definitions of terms in a logical language Standardized (logically defined) terms in an ontology are the equivalent of standardized units in the SI 9
10
Ontologies are computer-tractable representations of types in specific areas of reality are more and less general (upper and lower ontologies) – upper = organizing ontologies – lower = domain ontology modules 10
11
Linked Open Data are not enough 11
12
Links are inconsistently defined; ontologies are full of redundancies 12
13
Towards coordination of modular non-redundant ontologies 13
14
RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) The Open Biomedical Ontologies (OBO) Foundry 14
15
RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population (PCO) Organ Function (FMP, CPRO) Population Phenotype Population Process ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Population-level ontologies 15
16
RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Environment Ontology (EnvO) Environments 16
17
OBO Foundry approach extended into other domains 17 NIF StandardNeuroscience Information Framework IDO ConsortiumInfectious Disease Ontology cROPCommon Reference Ontologies for Plants MilPortal.orgMilitary Ontology AIRS Ontology SuiteIntelligence Ontology Suite
18
18
19
19
20
20
21
slide from Margaret Storey 21
22
Horizontal Integration of Big Intelligence Data The Role of Ontology in the Era of Big Data T. Malyuta, Ph. D New York City College of Technology, NY, NY B. Smith, Ph. D University at Buffalo, Buffalo, NY R. Rudnicki CUBRC, Buffalo, NY
23
23 http://ncorwiki.buffalo.edu/ index.php/Main_Page#Documents
24
Big Data Problem Wikipedia defines Big Data as “…a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools.” Gartner defines Big Data with three ‘V’s: – Volume – Velocity (of production and analysis) – Variety – Recently the forth ‘V’ – Veracity – was added This means that Big Data are beyond our control (as opposed to those complex and big systems with diverse and changing data where the complexity is known) 24
25
Big Data Solution – Agility Dimensions of agility – Storage paradigms that accommodate massive volumes of heterogeneous data – Data processing paradigms that can deal with the massive volumes of heterogeneous data coming onstream – Dynamic data stores that can easily accommodate diverse and a priori unknown data types and semantics – Methods and tools that leverage dynamic and diverse content 25
26
Agile Data Management New highly distributed computing (MapReduce) and data processing (Bigtable) paradigms and technologies based on them (hadoop.apache.org/, hbase.apache.org/) help in solving data management problems:MapReduceBigtablehadoop.apache.org/ hbase.apache.org/ – Store and process Volume – Keep up with Velocity – Represent Variety These technologies are not meant (and never were meant) to provide data interpretation – In data systems we have been dealing with data the meaning of which we knew (usually via data applications) – These technologies do not help in solving the problem of data integration and interoperability of systems 26
27
Agile Utilization Today, the main problem of the Big Data is how to use it – Utilization of ‘Variety’ – diverse and a priori unknown types and semantics – Ability of Big Data systems to interoperate – Ability to integrate Big Data – The last two problems are inherently difficult and could not be properly addressed by the data technology itself Traditional data utilization and integration approaches fail Relying on legacy data models and mappings (linked open data) fails – creates forking and mapping degradation Agile utilization and integration paradigms are needed 27
28
The Problem of Horizontal Integration of Big Intelligence Data HI =Def. the ability to exploit multiple data sources as if they are one Recognized issues for HI with existing approaches – Data silos – Lexicon/semantics silos Requirement for HI of Big Intelligence Data – Agile Semantic Interoperability A strategy for HI must be agile in the sense that it can be quickly extended to new zones of emerging data according to need Ontology allows an incremental approach – big bang already from the very first buck (we showed on the project that is described below) Ontology can provide the needed agility 28
29
Agile Semantic Interoperability A good solution has to be – Able to grow incrementally – Able to be developed in a distributed manner – Without losing consistency – Independent of particular implementations, and data producers and consumers – Applicable to data in an agile manner We call our solution: ‘semantic enhancement’ (SE) of data 29
30
Explication of general terms used in source intelligence artifacts and in data models, terminologies and doctrinal publications which provide typologies of intelligence- related IAs to semantically enhance data in a way that enables computational integration and reasoning Annotation of the instance-level information captured by such IAs to aid retrieval of information about specific persons, groups, events, documents, images, and so forth Explication vs. Annotation
31
SE Types Explication of general terms used in source intelligence artifacts and in data models, terminologies and doctrinal publications which provide typologies of intelligence-related IAs to semantically enhance data in a way that enables computational integration and reasoning Annotation of the instance-level information captured by such IAs to aid retrieval of information about specific persons, groups, events, documents, images, and so forth
32
SE SE is realized with the help of ontologies that are used to explicate data models and annotate data instances – Vocabulary of ontologies used for explications and annotations provides agile horizontal integration – Ontologies, by virtue of their nature and organization, provide semantic enhancement of data PersonIDNameDescription 111JavaProgramming 222SQLDatabase SQLJavaC++ ProgrammingSkill ComputerSkill Skill Education Technical Education 32
33
The Meaning of ‘Enhancement’ Semantic enhancement/enrichment of data = arm’s length approach (no change to data) – through simple explication we associate an entire knowledge system with a database field – enables analytics to process data, e.g. about computer skills, “vertically” along the Skill hierarchy, as well as “horizontally” via relations between Skill and Education. – and further… while data in the database does not change, its analysis can be richer and richer as our understanding of the reality changes For this richness to be leveraged by different communities, persons, and applications it needs to have the properties mentioned above and be constructed in accordance with the principles of the SE 33
34
SE Principles ⁻Create a Shared Semantic Resource (SSR) of ontologies to be used for explication and annotation ⁻Establish an agile strategy for building ontologies within this SSR, and apply and extend these ontologies to explicate and annotate new source data as they come onstream ⁻Problem: Given the immense and growing variety of data sources, the development methodology must be applied by multiple different groups ⁻How to manage collaboration? 34
35
Achieving the Goal Methodology of incremental distributed ontology development A common ontology architecture incorporating a common, domain-neutral, upper-level ontology (BFO) A shared governance and change management process A simple, repeatable process for ontology development An ontology registry A process of intelligence data capture through explication or source data models 35
36
Main Methodological Points Ontological realism – Based on Doctrine / Science – Involves SMEs in label selection and definition – Thoroughly tested in many projects Arms-length process, with minimal disturbance to existing data and data semantics Reference ontologies – capture generic content and are designed for aggressive reuse in multiple different types of context: Single reference ontology for each domain of interest Application ontologies – are tied to specific local applications – An application ontology is created by combining local content with generic content taken from relevant reference ontologies – Still interoperable because based on common set of reference ontologies * Barry Smith and Werner Ceusters, “Ontological Realism as a Methodology for Coordinated Evolution of Scientific Ontologies”, Applied Ontology, 5 (2010), 139–188. 36
37
Arms-length Process SE ontology labels Focusing on the terms (labels, acronyms, codes) used in ***our source data Where multiple distinct terms {t 1, …, t n } are used in separate data sources with one and the same meaning, they are associated with a single preferred label drawn from a standard set of such labels All the separate data items associated with the {t 1, … t n } thereby linked together through the corresponding preferred labels. Preferred labels form basis the for the ontologies we build Heterogeneous Contents ABC KLM XYZ 37
38
Reference and Application Ontologies vehicle =def: an object used for transporting people or goods tractor =def: a vehicle that is used for towing crane =def: a vehicle that is used for lifting and moving heavy objects vehicle platform=def: means of providing mobility to a vehicle wheeled platform=def: a vehicle platform that provides mobility through the use of wheels tracked platform=def: a vehicle platform that provides mobility through the use of continuous tracks artillery vehicle = def. vehicle designed for the transport of one or more artillery weapons wheeled tractor = def. a tractor that has a wheeled platform tracked tractor = def. a tractor that has a tracked platform artillery tractor = def. an artillery vehicle that is a tractor wheeled artillery tractor = def. an artillery tractor that has a wheeled platform Reference Ontology Application Definitions 38
39
Illustration of Ontology Types ( Toy Example ) Vehicle Tractor Wheeled Tractor Artillery Tractor Wheeled Artillery Tractor Artillery Vehicle Black – reference ontologies Red – application ontologies 39
40
Role of Reference Ontologies Normalized – Maintains a set of consistent ontologies – Eliminates redundancy Modular – A set of plug-and-play ontology modules – Enables distributed consistent development Surveyable 40
41
SE Architecture The Upper Level Ontology (ULO) in the SE hierarchy must be maximally general (no overlap with domain ontologies) The Mid-Level Ontologies (MLOs) introduce successively less general and more detailed representations of types which arise in successively narrower domains until we reach the Lowest Level Ontologies (LLOs). The LLOs are maximally specific representation of the entities in a particular one-dimensional domain 41
42
Architecture Illustration 42
43
Challenges to HI Too many lexicons The scope of the domain: signal, sensor, image, … intelligence about … the whole world Difficult to conduct governance and management of ontology development to ensure consistent evolution Lack of expertise Complexity of the ontology development and application process 43
44
Preventing Failure The method we use offers solutions to some of the common reasons for failure Lack of Consensus – Realism offers an objective standard for settling disputes over terminology. Ontology development becomes an empirical science instead of an exercise in the publication of dialects – Governance helps to resolve conflicts and achieve consensus High Maintenance – Arm’s length implementation places no additional overhead onto applications Parochialism – Architecture and methodology prevent development of vocabularies that apply only to a single perspective Poor Quality – Experience prevents common mistakes in vocabularies that cause downstream problems with search and analytics 44
45
Preventing Failure (cont.) Agile ontology development – Methodology and architecture – Growing SSR Agile ontology application – Incremental – Semi-automated where possible – Even if not as fast as some want it to be It is still faster than creating a physical store, which will be just another silo and will still need to be integrated with the rest of data Once a data collection is semantically enhanced, it is integrated with all data that had been and will be semantically enhanced without any additional efforts 45
46
What is Next… – IAO-Intel: An Information Artifact Ontology for the Intelligence Community (BS) – A Survey of DSGS-A Ontology Work and Explicating and Annotating Processes (R. Rudnicki) – Email Ontology – illustration of the methodology of ontology design and of the IAO-Intel (D. Salmen and W. Mandrick) 46
47
References Barry Smith, Tatiana Malyuta, William S. Mandrick, Chia Fu, Kesny Parent, Milan Patel, Horizontal Integration of Warfighter Intelligence Data: A Shared Semantic Resource for the Intelligence Community, STIDS Conference, 2012.Horizontal Integration of Warfighter Intelligence Data: A Shared Semantic Resource for the Intelligence Community Barry Smith, Tatiana Malyuta, David Salmen, William Mandrick, Kesny Parent, Shouvik Bardhan, Jamie Johnson, “Ontology for the Intelligence Analyst”, Crosstalk: The Journal of Defense Software Engineering, 2012. David Salmen, Tatiana Malyuta, Alan Hansen, Shaun Cronen, Barry Smith, Integration of Intelligence Data through Semantic Enhancement, STIDS Conference, 2011.Integration of Intelligence Data through Semantic Enhancement 47
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.