INNOVATION IN HEALTHCARE IT STANDARDS: THE PATH TO BIG DATA INTERCHANGE LUCIANA TRICAI CAVALINI, MD, PHD TIMOTHY WAYNE COOK, MSC.

Slides:



Advertisements
Similar presentations
© 2011 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. Towards a Model-Based Characterization of Data and Services Integration Paul.
Advertisements

The Acquisition and Sharing of Domain Knowledge Contained in Software with a Compliant SIK Architecture by Prof. dr. Vasile AVRAM Academy of Economic Studies.
EpiS3: a semantically interoperable social network for syndromic surveillance and disease control Luciana Tricai Cavalini and Timothy Wayne Cook National.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Dr. Kalpakis CMSC 461, Database Management Systems Introduction.
Chapter 2 Database System Concepts and Architecture
4/20/2017.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Software Requirements Engineering CSE 305 Lecture-2.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
Chapter 2 Database System Concepts and Architecture Dr. Bernard Chen Ph.D. University of Central Arkansas.
Dr. Sebastian Garde Ocean Informatics Medinfo 2013 Copenhagen, Copyright 2012 Ocean Informatics.
Oreste Signore- Quality/1 Amman, December 2006 Standards for quality of cultural websites Ministerial NEtwoRk for Valorising Activities in digitisation.
CSCE 315 – Programming Studio Spring Goal: Reuse and Sharing Many times we would like to reuse the same process or data for different purpose Want.
D ATABASE MANAGEMENT SYSTEM By Rubel Biswas. W HAT IS I NFORMATION ? It’s just something you can’t avoid. It is generally referred to as data.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
12. DISTRIBUTED WEB-BASED SYSTEMS Nov SUSMITHA KOTA KRANTHI KOYA LIANG YI.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Databases (CS507) CHAPTER 2.
Database Systems: Design, Implementation, and Management Tenth Edition
Introduction To DBMS.
By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani
Stanford University, Stanford, CA, USA
Chapter 2: Database System Concepts and Architecture - Outline
Datab ase Systems Week 1 by Zohaib Jan.
NeurOn: Modeling Ontology for Neurosurgery
Healthcare Information Technology Standards Panel
Chapter 2 Database System Concepts and Architecture
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Achieving Semantic Interoperability of Cancer Registries
Chapter 1: Introduction
Models & Modelling Heather Leslie Sebastian Guard Heather Grain
Distribution and components
Distributed web based systems
Web Engineering.
Data Quality: Practice, Technologies and Implications
NOSQL databases and Big Data Storage Systems
Survey Paper & Manuscript
Database Processing with XML
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Database Management System (DBMS)
The Re3gistry software and the INSPIRE Registry
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.
مدیریت داده ها و اطلاعات آزمایشگاه پزشکی
File Systems and Databases
Dr. Awad Khalil Computer Science Department AUC
Lecture 1: Multi-tier Architecture Overview
2. An overview of SDMX (What is SDMX? Part I)
2. An overview of SDMX (What is SDMX? Part I)
Analysis models and design models
Course: Module: Lesson # & Name Instructional Material 1 of 32 Lesson Delivery Mode: Lesson Duration: Document Name: 1. Professional Diploma in ERP Systems.
SDMX Information Model: An Introduction
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Middleware, Services, etc.
LOD reference architecture
Metadata The metadata contains
Database Management Systems
Chapter 1: Introduction
Database System Concepts and Architecture
Dr. Awad Khalil Computer Science Department AUC
NIEM Tool Strategy Next Steps for Movement
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 1: Introduction
Presentation transcript:

INNOVATION IN HEALTHCARE IT STANDARDS: THE PATH TO BIG DATA INTERCHANGE LUCIANA TRICAI CAVALINI, MD, PHD TIMOTHY WAYNE COOK, MSC

BIG DATA IN HEALTHCARE MYTHS (AND FACTS)

MYTH #1: "BIG DATA" HAS A UNIVERSALLY ACCEPTED, CLEAR DEFINITION Two of these aspects are a particular concern in healthcare: VariabilityVelocity The various definitions have the 3V in common: Volume: Existence of gigantic amounts of data Variability: Coexistence of structured, non- structured, machine generated etc data Velocity: Data is produced, and it has to be processed and consumed very fast There is no consensus in scientific literature and on the specialized blogosphere about the definition of Big Data

MYTH #2: BIG DATA IS NEW Collecting, processing and analyzing sheer amounts of data is not a new activity in mankind Example: Middle Age monks and their concordances (correlations of every single word in the Bible) What is new is the volume size and the speed it can be processed and analyzed

MYTH #3: BIGGER DATA IS BETTER In biomedical science, this is partially fact: the bigger the sample size, the more precise the estimates are However, large sample sizes with bad quality data are dangerously misleading In healthcare, precision and reliability are both equally important

MYTH #4: BIG DATA MEANS BIG MARKETING There is no evidence that analyzing Big Data increases the number of customers Big Data is useful when it helps emerging actionable insights (e.g., an unknown relationship between a gene and a disease) That has little relevance in healthcare, especially in universal healthcare systems

HOW TO GET RELIABLE BIG DATA? TRADITIONAL STANDARDS X INNOVATION

THE TRADITIONAL HEALTHCARE IT STANDARDS HL7, openEHR, ISO Primary focus on message exchange among EMRs All of them precede in history the emergence of Big Data and the Semantic Web Top-down data modeling approach: not prepared to deal with the 3V of Big Data SNOMED- CT, LOINC, ICD Controlled vocabularies Also preceding Big Data and Semantic Web Main focus on pre- coordination (top- down approach) In other words: the traditional healthcare IT standards are not prepared to deal with Big Data

A DEVELOPMENT ABOUT OPENEHR The current version of the Archetype Definition Language is 1.4 It requires an archetype to be the maximal data set for a given concept By the book, it means that there can be just one archetype for each single concept in the whole globe There are several archetypes being developed in isolation, not being submitted to the proper governance tool (the CKM) In the ADL 1.5 spec, it is promised that the “maximal data model” requirement will be removed

Now Everywhere Locally BIG DATA IS BEING PRODUCED:

A BIG DATA-AWARE HEALTHCARE IT STANDARD IS: Compliant to Semantic Web Technologies Respectful to the different points of view coming from different medical schools Welcoming to all healthcare professionals (and their concepts) Not limited to EMR data modeling Prepared to deal with the emerging mHealth and the Internet of Things

MULTILEVEL HEALTHCARE INFORMATION MODELING (MLHIM) AN INNOVATION IN HEALTHCARE IT STANDARDS

THE BACKGROUND - 1 The typical application design locks up semantics in the database structure and application source code Different use cases in different scenarios often interpret seemingly similar data, differently when the semantics are missing Multilevel modelling provides a way to share semantics about any medical (healthcare) concept between distributed and independent applications

THE BACKGROUND - 2 MLHIM is based on the core modelling concepts of openEHR to provide semantics external from applications From openEHR, MLHIM inherited the multilevel model principles MLHIM also uses certain conceptual principles from HL7 v3 From HL7, MLHIM inherited the XML-based implementation

THE IMPLEMENTATION MLHIM simplifies the openEHR Reference Model It is called a ‘minimalistic’ multilevel model MLHIM uses XML instead of ADL so that ubiquitous tooling and training are available The whole Semantic Web is based on XML technologies Because MLHIM is based on the XML Schema data model there is no loss of information between model semantics and serialization in XML instances This is a problem when serializing ADL into XML (see next)

A NOTE ON ADL X XML There is a loss of information when moving between an object model (AOM) and XML Schema dADL is the proper instance serialization for the AOM However, in practice implementers are serializing openEHR/ISO13606 data in XML

ADL X XML: A COMPARISON ADLXML The openEHR test suite includes approximately 1600 total files, with known independent validations of its files The XML Schema test suite contains more than 40,000 independently validated tests OpenEHR tools are developed by one company and there is one open source reference model There are more than 30 XML editors, open source and proprietary from as many companies. There are additional tools in the XML family, XSLT, Xquery, Xlink and Xproc The FOSS Java RM has not been thoroughly tested and validated There are at least 3 widely used, XML parser/validators, open source and proprietary from different companies and communities The only ADL courses are from Ocean Informatics and a few startup course taught by non-experts XML is taught in all computer science courses as well as online There are zero books on ADLO'Reilly has 54 books on XML, Amazon has 11,890 results for Books: "xml"

QUESTION BREAK

MODELING CLINICAL MODELS IN MLHIM THE HEART OF HEALTHCARE IT STANDARDIZATION

CLINICAL KNOWLEDGE MODELING: FUNDAMENTALS Modeling clinical data is a complex task Requires deep knowledge of the specific clinical domain Requires at least an intermediate understanding of data types Modeling clinical data is a core activity in healthcare IT It is the only way to produce Big Data in healthcare with responsibility Even well designed clinical data modes in conventional software are not interoperable Multilevel model software is interoperable and it requires thoughtful clinical knowledge modeling

CLINICAL MODELS IN MULTILEVEL MODELING The Reference Model: generic information model shared by the ecosystem The Domain Model: definition of constraints to the Reference Model for each medical concept In multilevel modeling, the information ecosystem is structured in (at least) two levels: Multilevel ModelopenEHRMLHIM Domain ModelArchetypeConcept Constraint Definition (CCD) LanguageADLXML Schema 1.1 # of DM/concept1n GovernanceTop down, consensusBottom-up, merit

CONCEPT CONSTRAINT DEFINITION (CCD) In MLHIM, CCDs are XML Schemas that define constraints to the Reference Model, in order to model clinical concepts CCDs can be validated to the correspondent MLHIM Reference Model by third- party applications The CCD Schema informs the application developer of the structure of a valid data instance for each concept modeled for that system If the CCD is made public, any receptor of a data instance coming from this application can store, validate, query etc that data instance

CCD HIGH LEVEL STRUCTURE CCD Care, Demographic or AdminEntry Cluster DvAdapter (or Cluster) DataType

MLHIM DATATYPES FOR CCD S Ordered Quantified DvCount DvQuantity DvRatio DvOrdinalDvTemporal Unordered DvString (with enumeration) (without enumeration) DvCodedStringDvMediaDvParsable DvInterval RerefenceRange

MLHIM ELEMENTS: PRINCIPLES The elements of a CCD do not carry any semantics Since element names are structural identifiers, this is in keeping with the best practices of healthcare knowledge artifact identifiers, as first proposed by Dr. James Cimino (circa 1988) Characteristic #3 - Dumb Identifiers An identifier itself should not have meaning. If an identifier is comprised of other identifiers that have been combined, then the composite identifier is inherently unstable. If the circumstances that related the composite identifiers together in the first place change, the resulting identifier must also change.

MLHIM CCD S : TECHNICAL ASPECTS CCDs are the equivalent of an archetype in CEN13606 and openEHR They may be defined at any level, for any application use complexType definitions may be reused in multiple CCDs CCDs persist for all time and are not versioned, this is essential for data integrity across time All element names are unique identifiers (Type 4 UUIDs) With the exceptions:

CCD GOVERNANCE MODEL Artifact governance in MLHIM consists of maintaining a copy of the CCDs and Reference Models This can be on the web at the specified location or locally and referenced using the standard XML Catalog tools Because of the naming conventions, changes to the MLHIM reference model does not impact previously defined CCDs or data This maintains accurate semantics for all time

MLHIM RESOURCES PUTTING INNOVATION INTO PRACTICE

MLHIM REFERENCE MODEL The release version is availble at The development version is available at

CCD GENERATOR (CCD-GEN) CCD editor maintained by the MLHIM Laboratory at CCDs according to the correspondent MLHIM Reference ModelCCDs are automatically validatedOther products include: A sample data instance JSON serialization of the data instance A sample HTML form Modules for the R programming language to pull MLHIM data into R data frames for processing and analysis

OTHER MLHIM TOOLS A MLHIM repository using an SQL DB for persistence with a browser and a REST interface MLHIM Application Platform & Learning Environment (MAPLE) Utility to convert MLHIM CCD XML instances to use shortuuids and to convert to JSON and back again to XML It is intended to demonstrate how mobile apps can use smaller data files to pass over the wire to an API that expects these formats and can convert them back to full XML instances for validation MLHIM XML Instance Converter (MXIC) Web application to build a form and create a CCD from it (work in progress) Form2CCD FOSS CCD editor (work in progress) Constraint Definition Designer (CDD)

IN BRIEF CONCLUSIONS AND THE VIEW TO THE FUTURE

MLHIM IS BIG DATA READY MLHIM uses standard XML technologies and embedded RDF to define the syntax and semantics The semantics are in the CCD and can be easily exchanged or referenced via the web Their RDF can be queried, analyzed and linked using standard tools MLHIM data can be stored in SQL or NoSQL databases Examples are on GitHub for eXist-DB (XML) and SQLite3 (can easily be ported to use PostgreSQL, MySQL, Oracle, etc.) We also have experience with MLHIM data in a MarkLogic NoSQL cloud cluster environment In addition to native XML DBs, the small document oriented nature of MLHIM data is a perfect fit for document databases such as MongoDB and CouchDB MLHIM XML data can easily be round-trip converted to JSON for permanent storage and/or as an exchange serialization via REST APIs

OUR VISION OF THE FUTURE There are intuitions inside the healthcare IT world already about the inadequacy of conventional EMRs to collect reliable data at the point of care The real Big Data in healthcare will come from purpose-specific applications modeled by the domain experts The hardware support of choice for those apps is the mobile computing The other source of Big Data in healthcare will come from the Internet of Things All that data which is MLHIM compliant will participate in a semantically interoperable health information ecosystem

THANK YOU! /mlhim2