Download presentation
Presentation is loading. Please wait.
Published bySilas Barnett Modified over 9 years ago
1
How to integrate data Barry Smith
2
The problem: many, many silos DoD spends more than $6B annually developing a portfolio of more than 2,000 business systems and Web services these systems are poorly integrated deliver redundant capabilities, make data hard to access, foster error and waste prevent secondary uses of data https://ditpr.dod.mil/https://ditpr.dod.mil/ Based on FY11 Defense Information Technology Repository (DITPR) data 2
3
what is missing here 3
4
Syntactic and semantic interoperability Syntactic interoperability = systems can exchange messages (realized by XML). Semantic interoperability = messages are interpreted in the same way by senders and receivers. When meanings are specified via natural- language strings, experience shows that this is not a viable route to achieving semantic interoperability. 4
5
Instance data vs. data about types instances: Bill Clinton Bill Clinton’s dog the planet Earth types: human being dog plant
6
DoD Enterprise Ontology http://www.youtube.com/watch?v=OzW3Gc_yA9A Dennis Wisnosky, Chief Architect & Chief Technical Officer, Business Mission Area, Office of the Deputy Chief Management Officer, US Department of Defense
7
Instance data vs. data about types instances: Iraq Basra Abu Ghraib types: country city prison
8
compare: legends for maps maps vs. legends for maps 8
9
compare: legends for maps common legends allow (cross-border) integration 9
10
The Gene Ontology MouseEcotope GlyProt DiabetInGene GluChem sphingolipid transporter activity 10
11
The Gene Ontology MouseEcotope GlyProt DiabetInGene GluChem Holliday junction helicase complex 11
12
The Gene Ontology MouseEcotope GlyProt DiabetInGene GluChem sphingolipid transporter activity 12
13
Common legends help human beings use and understand complex representations of reality help human beings create useful complex representations of reality help computers process complex representations of reality help glue data together But common legends serve these purposes only if the legends are developed in a coordinated, non-redundant fashion 13
14
International System of Units 14
15
RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) The Open Biomedical Ontologies (OBO) Foundry 15
16
CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RNAO, PRO) Molecular Function (GO) Molecular Process (GO) rationale of OBO Foundry coverage GRANULARITY RELATION TO TIME 16
17
RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Organ Function (FMP, CPRO) Population Phenotype Population Process ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Population-level ontologies 17
18
RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Environment Ontology environments 18
19
19 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Organ Function (FMP, CPRO) Population Phenotype Population Process ORGAN AND ORGANISM Organism (NCBI Taxonomy) (FMA, CARO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cell Com- ponent (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) http://obofoundry.org E N V I R O N M E N T
20
OBO Foundry approach being applied in the following biology domains 20 NIF StandardNeuroscience Information Framework ISF OntologiesIntegrated Semantic Framework OGMS and ExtensionsOntology for General Medical Science IDO ConsortiumInfectious Disease Ontology cROPCommon Reference Ontologies for Plants
21
What do ontology annotations do? make data retrievable even by those not involved in their creation allow integration of data deriving from heterogeneous sources break down the walls of roach motels 21
22
Applying the annotations approach to military data via Semantic Enhancement data remain in their original state (is treated at ‘arms length’) ‘tagged’ using interoperable ontologies created in coordinated way allows flexible response to new needs, adjustable in real time can be as complete as needed, lossless, long-lasting because flexible and responsive big bang for buck – measurable benefit even from first small investments The strategy works only to the degree that it rests on shared governance and training 22
23
Benefits of the Approach Does not interfere with the source content Enables content to evolve in a cumulative fashion as it accommodates new kinds of data Does not depend on the data resources and can be developed independently from them in an incremental and distributed fashion Provides a more consistent, homogeneous, and well- articulated presentation of the content which originates in multiple internally inconsistent and heterogeneous systems 23
24
Benefits of the Approach Makes management and exploitation of the content more cost-effective Allows graceful integration with other government initiatives and brings the system closer to the federally mandated net-centric data strategy Creates incrementally an integrated content that is effectively searchable and that provides content to which more powerful analytics can be applied 24
25
Building the Shared Semantic Resource Methodology of distributed incremental development Training Governance Common Architecture of Ontologies to support consistency, non-redundancy, modularity Upper Level Ontology (BFO) Mid-Level Ontologies Low Level Ontologies 25
28
Goal: To realize Horizontal Integration(HI) of intelligence data HI =Def. the ability to exploit multiple data sources as if they are one Problem: the data coming onstream are out of our control Any strategy for HI must be agile in the sense that it can be quickly extended to new zones of emerging data according to need 28
29
Army Intelligence and Information Warfare Directorate (I2WD) Create an agile strategy for building ontologies within a Shared Semantic Resource (SSR) and apply and extend these ontologies to annotate new source data as they come onstream Problem: Given the immense and growing variety of data sources, the development methodology must be applied by multiple different groups: How to manage collaboration? 29
30
Why do large-scale ontology projects fail? focus on vocabularies, lexicons, with no logical structure, no attention to life cycle failure of housekeeping yields redundancy and therefore forking the same data is annotated in different ways by users of different ontology fragments data is siloed as before HOW TO BUILD THE NEEDED LOGIC INTO THE ARCHITECTURE OF THE ONTOLOGIES? 30
31
MeSH (Medical Subject Headings) MeSH Descriptors Anthropology, Education, Sociology and Social Phenomena Social Sciences Political Systems National Socialism National Socialism is_a Political Systems National Socialism is_a Anthropology...
32
Examples of Principles All terms in all ontologies should be singular nouns Same relations between terms should be reused in every ontology Reference ontologies should be based on single inheritance All definitions should be of the form an S = Def. a G which Ds where ‘G’ (for: genus) is the parent term of S (for: species) in the corresponding reference ontology
33
Anatomy Ontology (FMA*, CARO) Environment Ontology (EnvO) Infectious Disease Ontology (IDO*) Biological Process Ontology (GO*) Cell Ontology (CL) Cellular Component Ontology (FMA*, GO*) Phenotypic Quality Ontology (PaTO) Subcellular Anatomy Ontology (SAO) Sequence Ontology (SO*) Molecular Function (GO*) Protein Ontology (PRO*) Extension Strategy + Modular Organization 33 top level mid-level domain level Information Artifact Ontology (IAO) Ontology for Biomedical Investigations (OBI) Spatial Ontology (BSPO) Basic Formal Ontology (BFO)
34
Ontologies are built as orthogonal modules which form an incrementally evolving network developers and SMEs are motivated to commit to developing ontologies because they will need in their own work ontologies that fit into this network users are motivated by the assurance that the ontologies they turn to are maintained by experts 34
35
More benefits of orthogonality helps those new to ontology to find what they need to find models of good practice ensures mutual consistency of ontologies (trivially) and thereby ensures additivity of annotations 35
36
More benefits of orthogonality No need to reinvent the wheel for each new domain Can profit from storehouse of lessons learned Can more easily reuse what is made by others Can more easily reuse training Can more easily inspect and criticize results of others’ work Leads to innovations (e.g. Mireot, Ontofox) in strategies for combining ontologies 36
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.