Public Health Ontology 101 Mark A. Musen, M.D., Ph.D. Stanford Center for Biomedical Informatics Research Stanford University School of Medicine Die Seuche.

Slides:



Advertisements
Similar presentations
The National Center for Biomedical Ontology One of three National Centers for Biomedical Computing launched by NIH in 2005 Collaboration of Stanford, Berkeley,
Advertisements

1 Using Ontologies in Clinical Decision Support Applications Samson W. Tu Stanford Medical Informatics Stanford University.
BioPortal: A Web Repository and Services for Biomedical Ontologies and Data Resources Natasha Noy and the BioPortal team Stanford Center for Biomedical.
BioPortal Status and Plans September 2011 Ray Fergerson NCBO Project Director Stanford University 1.
BioPortal as (the only functional) OOR SandBox (so far) Natasha Noy, Michael Dorf Stanford University.
1 The Business Case for Large-Scale Ontology Projects: Are we at a tipping point? Mark A. Musen, M.D., Ph.D. Stanford Medical Informatics Stanford University.
Opportunistic Reasoning for the Semantic Web: Adapting Reasoning to the Environment Carlos Pedrinaci Tim Smithers and Amaia Bernaras.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
ISWC Doctoral Symposium Monday, 7 November 2005
Who will Classify the Classifications? Differing Views of Biomedical Ontology Mark A. Musen Stanford Medical Informatics Stanford University.
E-Preference: A Tool for Incorporating Patient Preferences into Health Decision Aids Amar K. Das, MD, PhD Assistant Professor Departments of Medicine (Medical.
Consistent and standardized common model to support large-scale vocabulary use and adoption Robust, scalable, and common API to reduce variation in clinical.
The National Center for Biomedical Ontology Online Knowledge Resources for the Industrial Age Mark A. Musen Stanford University
1 Building Ontologies from the Ground Up When users set out to model their professional activity Mark A. Musen Professor of Medicine and Computer Science.
Biomedical Informatics Some Observations on Clinical Data Representation in EHRs Christopher G. Chute, MD DrPH, Mayo Clinic Chair, ICD11 Revision, World.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
BioSTORM: A Test Bed for Configuring and Evaluating Biosurveillance Methods Samson W. Tu, M.S., 1 Martin J. O’Connor, M.Sc., 1 David L. Buckeridge, M.D,
Biological Ontologies Neocles Leontis April 20, 2005.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy and Mark A. Musen.
Ontologies in Biomedicine Mark A. Musen Stanford University.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Medical Informatics Basics
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
1 Joined up Health and Bio Informatics: Joined up Health and Bio Informatics: Alan Rector Bio and Health Informatics Forum/ Medical Informatics Group Department.
“Enhancing Reuse with Information Hiding” ITT Proceedings of the Workshop on Reusability in Programming, 1983 Reprinted in Software Reusability, Volume.
1 These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 5/e and are provided with permission by.
SNOMED CT Denise Downs Knowledge Management & Education Lead Data Standards, Technology Office Department of Health Informatics Directorate.
Medical Informatics Basics
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
BioHealth Informatics Group Advanced OWL Tutorial 2005 Ontology Engineering in OWL Alan Rector & Jeremy Rogers BioHealth Informatics Group.
1 Chapter 14 Architectural Design 2 Why Architecture? The architecture is not the operational software. Rather, it is a representation that enables a.
Medical Informatics Basics Lection 1 Associated professor Andriy Semenets Department of Medical Informatics.
A Case Study of ICD-11 Anatomy Value Set Extraction from SNOMED CT Guoqian Jiang, PhD ©2011 MFMER | slide-1 Division of Biomedical Statistics & Informatics,
Manchester Medical Informatics Group OpenGALEN 1 Linking Formal Ontologies: Scale, Granularity and Context Alan Rector Medical Informatics Group, University.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
A service-oriented middleware for building context-aware services Center for E-Business Technology Seoul National University Seoul, Korea Tao Gu, Hung.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.
1 Introduction to Software Engineering Lecture 1.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Ontologies in Biomedicine What is the “right” amount of semantics? Mark A. Musen Stanford University.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
SSO: THE SYNDROMIC SURVEILLANCE ONTOLOGY Okhmatovskaia A, Chapman WW, Collier N, Espino J, Conway M, Buckeridge DL Ontology Description The SSO was developed.
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Data Integration and Management A PDB Perspective.
Performance evaluation on grid Zsolt Németh MTA SZTAKI Computer and Automation Research Institute.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
OWL Representing Information Using the Web Ontology Language.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Winter 2011SEG Chapter 11 Chapter 1 (Part 1) Review from previous courses Subject 1: The Software Development Process.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Approach to building ontologies A high-level view Chris Wroe.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
SAGE Nick Beard Vice President, IDX Systems Corp..
Don’t know much about philosophy: The confusion over bio-ontologies
The National Center for Biomedical Ontology
BioPortal as (the only functional) OOR SandBox (so far)
Department of Biological and Medical Physics
ece 627 intelligent web: ontology and beyond
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Neil A. Ernst, Margaret-Anne Storey, Polly Allen, Mark Musen
The National Center for Biomedical Ontology
Building Ontologies with Protégé-2000
Presentation transcript:

Public Health Ontology 101 Mark A. Musen, M.D., Ph.D. Stanford Center for Biomedical Informatics Research Stanford University School of Medicine Die Seuche (The Plague), A. Paul Weber, Courtesy of the NLM

Many Factors can Influence the Effectiveness of Outbreak Detection Progression of disease within individuals Population and exposure characteristics Surveillance system characteristics From Buehler et al. EID 2003;9:

Computational Challenges Access to data Interpretation of data Integration of data Identification of appropriate analytic methods Coordination of problem solving to address diverse data sources Determining what to report

Interpretation of data Clinical data useful for public health surveillance are often collected for other purposes (e.g., diagnostic codes for patient care, billing) Such data may be biased by a variety of factors – Desire to protect the patient – Desire to maximize reimbursement – Desire to satisfy administrative requirements with minimal effort Use of diagnostic codes is problematic because precise definitions generally are unknownboth to humans and computers

A Small Portion of ICD9-CM 724Unspecified disorders of the back 724.0Spinal stenosis, other than cervical Spinal stenosis, unspecified region Spinal stenosis, thoracic region Spinal stenosis, lumbar region Spinal stenosis, other 724.1Pain in thoracic spine 724.2Lumbago 724.3Sciatica 724.4Thoracic or lumbosacral neuritis 724.5Backache, unspecified 724.6Disorders of sacrum 724.7Disorders of coccyx Unspecified disorder of coccyx Hypermobility of coccyx Coccygodynia 724.8Other symptoms referable to back 724.9Other unspecified back disorders

The combinatorial explosion 1970s ICD9: 8 Codes

ICD10 (1999): 587 codes for such accidents V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged in sports activity X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities

Syndromic Surveillance Requires enumeration of relevant syndromes Requires mapping of codes (usually in ICD9) to corresponding syndromes Is complicated by the difficulty of enumerating all codes that appropriately support each syndrome Is complicated by lack of consensus on what the right syndromes are in the first place

There is no consistency in how syndromes are defined or monitored SystemSyndrome ENCOMPASS Respiratory illness with fever CDC MMWR 10/01 Unexplained febrile illness associated with pneumonia RSVP, New Mexico Influenza-like illness Santa Clara County Flu-like symptoms Winter Olympics, Utah 2002 Respiratory infection with fever consensus definition

The solution to the terminology mess: Ontologies Machine-processable descriptions of what exists in some application area Allows computer to reason about – Concepts in the world – Attributes of concepts – Relationships among concepts Provides foundation for – Intelligent computer systems – The Semantic Web

What Is An Ontology? The study of being A discipline co-opted by computer science to enable the explicit specification of – Entities – Properties and attributes of entities – Relationships among entities A theory that provides a common vocabulary for an application domain

Supreme genus: SUBSTANCE Subordinate genera: BODYSPIRIT Differentiae: material immaterial Differentiae: animate inanimate Differentiae: sensitiveinsensitive Subordinate genera: LIVING MINERAL Proximate genera: ANIMAL PLANT Species: HUMANBEAST Differentiae: rational irrational Individuals: Socrates Plato Aristotle … Porphyrys depiction of Aristotles Categories

Heart Cavity of Heart Wall of Heart Right Atrium Cavity of Right Atrium Wall of Right Atrium Fossa Ovalis Myocardium Sinus Venarum SA Node Myocardium of Right Atrium Cardiac Chamber Hollow Viscus Internal Feature Organ Cavity Organ Cavity Subdivision Anatomical Spatial Entity Anatomical Feature Body Space Organ Component Organ Subdivision Viscus Organ Part Organ Anatomical Structure Parts of the heart Foundational Model of Anatomy Is-a Part-of

The FMA demonstrates that distinctions are not universal Blood is not a tissue, but rather a body substance (like saliva or sweat) The pericardium is not part of the heart, but rather an organ in and of itself Each joint, each tendon, each piece of fascia is a separate organ These views are not shared by many anatomists!

Why develop an ontology? To share a common understanding of the entities in a given domain – among people – among software agents – between people and software To enable reuse of data and information – to avoid re-inventing the wheel – to introduce standards to allow interoperability and automatic reasoning To create communities of researchers

We really want ontologies in electronic form Ontology contents can be processed and interpreted by computers Interactive tools can assist developers in ontology authoring

The NCI Thesaurus in Protégé-OWL

Goals of Biomedical Ontologies To provide a classification of biomedical entities To annotate data to enable summarization and comparison across databases To provide for semantic data integration To drive natural-language processing systems To simplify the engineering of complex software systems To provide a formal specification of biomedical knowledge

Biosurveillance Data Sources Ontology

Ontology defines how data should be accessed from the database

Ontologies: Good news and bad news The Good news – Ontologies allow computers to understand definitions of concepts and to relate concepts to one another – Automated inheritance of attributes makes it very easy to add new concepts to an ontology over time – Ontologies can be developed in standard knowledge- representation languages that have wide usage The Bad news – Most current biomedical ontologies have been developed using non-standard languages – Its still very hard to get people to agree about the content of proposed ontologies

Computational Challenges Access to data Interpretation of data Integration of data Identification of appropriate analytic methods Coordination of problem solving to address diverse data sources Determining what to report

The Medical Entities Dictionary (after Cimino) MED Patient Registration Clinical Laboratory Radiology Pharmacy

Ontologies for data integration Hema- tology Lab Result Serum Chemistry Electro- lytes Amino- transferases Sodium HCO 3 Patient Database 1 Patient Database 2 Patient Database 3 HCO 3 Bicarbonate Bicarb HCO 3 – Ontology of patient data (Canonical data value)

Computational Challenges Access to data Interpretation of data Integration of data Identification of appropriate analytic methods Coordination of problem solving to address diverse data sources Determining what to report

Different types of data require different types of problem solvers Are the data multivariate or univariate? Do the data involve temporal or spatial dimensions? Are the data categorical or probabilistic? Are the data acquired as a continuous stream or as a batch? Is it possible for temporal data to arrive out of order? What is the rate of data acquisition and what are the numbers of data that need to be processed?

Obtain Current Observation Binary Alarm Transform Data Forecast Compute Test Value Estimate Model Parameters Obtain Baseline Data Evaluate Test Value Compute Expectation Empirical Forecasting Moving Average Mean, StDev Database Query Aberrancy Detection (Temporal) Residual-Based Layered Alarm EWMA Cumulative Sum P-Value.. Constant (theory-based) Outlier Removal Smoothing.. GLM Model Fitting Trend Estimation.. GLM Forecasting Compute Residual Evaluate Residual Binary Alarm Aberrancy Detection (Control Chart) Layered Alarm Raw Residual Z-Score.. EWMA Generalized Exponential Smoothing ARIMA Model Fitting Signal Processing Filter ARIMA Forecasting

BioSTORM: A Prototype Next-Generation Surveillance Sytem Developed at Stanford, initially with funding from DARPA, now from CDC Provides a test bed for evaluating alternative data sources and alternative problem solvers Demonstrates – Use of ontologies for data acquisition and data integration – Use of a high-performance computing system for scalable data analysis

Data Sources Data Regularization Middleware Epidemic Detection Problem Solvers Control Structure BioSTORM Data Flow Mapping Ontology Heterogeneous Input Data Semantically Uniform Data Customized Output Data Data BrokerData Mapper Data Source Ontology

Distributed Data Sources Data Broker Data Source Ontology Heterogeneous Data Input Semantically Uniform Data Objects Data Broker and Data Source Ontology

Biosurveillance Data Sources Ontology

Ontology defines how data should be accessed from the database

Distributed Data Sources Data Broker Data Source Ontology Heterogeneous Data Input Semantically Uniform Data Objects Data Broker and Data Source Ontology

Semantically Uniform Data Objects Data Mapper Customized Data Objects Mapping Ontology Data Source Ontology Input–Output Ontology Problem Solver Data Mapper and Mapping Ontology

Data Mapper Mapping Ontologies Problem Solvers Input–Output Ontologies Varying Problem Solvers Customized Data Objects Semantically Uniform Data Objects

Obtain Current Observation Binary Alarm Transform Data Forecast Compute Test Value Estimate Model Parameters Obtain Baseline Data Evaluate Test Value Compute Expectation Empirical Forecasting Moving Average Mean, StDev Database Query Aberrancy Detection (Temporal) Residual-Based Layered Alarm EWMA Cumulative Sum P-Value.. Constant (theory-based) Outlier Removal Smoothing.. GLM Model Fitting Trend Estimation.. GLM Forecasting Compute Residual Evaluate Residual Binary Alarm Aberrancy Detection (Control Chart) Layered Alarm Raw Residual Z-Score.. EWMA Generalized Exponential Smoothing ARIMA Model Fitting Signal Processing Filter ARIMA Forecasting

Data Sources Data Regularization Middleware Epidemic Detection Problem Solvers Control Structure BioSTORM Data Flow Mapping Ontology Heterogeneous Input Data Semantically Uniform Data Customized Output Data Data BrokerData Mapper Data Source Ontology

Computational Challenges Access to data Interpretation of data Integration of data Identification of appropriate analytic methods Coordination of problem solving to address diverse data sources Determining what to report

We need to address the challenges of automating surveillance Current surveillance systems – Require major reprogramming to add new data sources or new analytic methods – Lack the ability to select data sources and analytic methods dynamically based on problem-solving requirements – Ignore qualitative data and qualitative relationships – Will not scale up to the requirements of handling huge data feeds The existing health information infrastructure – Is all-too-often paper-based – Uses 19 th century techniques for encoding knowledge about clinical conditions and situations – Remains fragmented, hindering data access and communication

The National Center for Biomedical Ontology One of three National Centers for Biomedical Computing launched by NIH in 2005 Collaboration of Stanford, Berkeley, Mayo, Buffalo, Victoria, UCSF, Oregon, and Cambridge Primary goal is to make ontologies accessible and usable Research will develop technologies for ontology dissemination, indexing, alignment, and peer review

Our Center offers Technology for uploading, browsing, and using biomedical ontologies Methods to make the online publication of ontologies more like that of journal articles Tools to enable the biomedical community to put ontologies to work on a daily basis

Local Neighborhood view Browsing/Visualizing Ontologies

BioPortal will experiment with new models for Dissemination of knowledge on the Web Integration and alignment of online content Knowledge visualization and cognitive support Peer review of online content

BioPortal is building an online community of users who Develop, upload, and apply ontologies Map ontologies to one another Comment on ontologies via marginal notes to give feedback – To the ontology developers – To one another Make proposals for specific changes to ontologies Stay informed about ontology changes and proposed changes via active feeds

Public Health Ontology 101 Mark A. Musen, M.D., Ph.D. Stanford Center for Biomedical Informatics Research Stanford University School of Medicine Die Seuche (The Plague), A. Paul Weber, Courtesy of the NLM