Data Normalization Dr. Stan Huff. # 2 Acknowledgements Tom Oniki Joey Coyle Craig Parker Yan Heras Cessily Johnson Roberto Rocha Lee Min Lau Alan James.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

EMRLD A RIM-based Data Integration Approach Pradeep Chowdhury Manager, Data Integration.
# 1 Practical modeling issues: Representing coded and structured patient data in EHR systems AMIA Fall Conference Novermber 13, 2010 Stanley M Huff, MD.
Archetypes in HL7 2.x Archetypes in HL7 Version 2.x Andrew McIntyre Medical Objects 9 th HL7 Australia Conference, 8.
Health IT Workforce Curriculum Version 1.0 Fall Networking and Health Information Exchange Unit 4e Basic Health Data Standards Component 9/Unit.
Consistent and standardized common model to support large-scale vocabulary use and adoption Robust, scalable, and common API to reduce variation in clinical.
openEHR The Reference Model
Data Normalization Milestones. Data Normalization  Goals –To conduct the science for realizing semantic interoperability and integration of diverse data.
MOHH – Models Submission Dr Linda Bird 9 th August 2012.
Kaiser Permanente Standards Summit September 7-8, 2011 Stanley M. Huff, MD Huff # 1 A Brief Review of CIMI Plans and Goals Phoenix CIMI Meetings January.
CIMI Modelling Taskforce Progress Report
Kaiser Permanente Standards Summit September 7-8, 2011 Stanley M. Huff, MD Huff # 1 A Brief Review of CIMI Plans and Goals London CIMI Meetings November.
Introduction to openEHR
Kaiser Permanente Standards Summit September 7-8, 2011 Stanley M. Huff, MD Huff # 1 A Brief Review of CIMI Plans and Goals Leeds CIMI Meetings April 11,
Archetypes in HL7 2.x Snomed-CT “Computable Medical Records” Andrew McIntyre Medical-Objects NEHTA Snomed-CT Workshop.
The Role of Standard Terminologies in Facilitating Integration James J. Cimino, M.D. Departments of Biomedical Informatics and Medicine Columbia University.
FEBRUARY 4, 2015 STANLEY M. HUFF, MD CHIEF MEDICAL INFORMATICS OFFICER INTERMOUNTAIN HEALTHCARE Modeling and Terminology.
LRI Validation Suite Meeting November 1st, Agenda Review of LIS Test Plan Template CLIA Testing EHR testing (Juror Document)—Inspection Testing.
3/18/19990© 1999, Health Level Seven, Inc. Introduction: Vocabulary domains Marital Status –single (never married) –married –divorced –separated “Vocabulary”
Harmonization of SHARPn Clinical Element Models with CDISC SHARE Clinical Study Data Standards Guoqian Jiang, MD, PhD Mayo Clinic On behalf of CDISC CEMs.
The Final Standards Rule John D. Halamka MD. Categories of Standards Content Vocabulary Privacy/Security.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Initial Prototype for Clinical Data Normalization and High Throughput Phenotyping SHARPn F2F June 30,2011.
Document Standards Faced and Lessons Learned Calvin Beebe, Tom Oniki, Hongfang Liu, Kyle Marchant.
TM HCLS Semantic Web in Healthcare A view of where we are and where we need to go in health care semantics Cecil O. Lynch, MD, MS
December 15, 2011 Use of Semantic Adapter in caCIS Architecture.
HL7 HL7  Health Level Seven (HL7) is a non-profit organization involved in the development of international healthcare.
SOIS Dictionary of Terms Usage in Tool Chain. Summary of DoT in SOIS Tool Chain The details hidden by the compression of this diagram will appear in subsequent.
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
National Institute of Standards and Technology Technology Administration U.S. Department of Commerce 1 Patient Care Devices Domain Test Effort Integrating.
Peoplesoft XML Publisher Integration with PeopleTools -Jayalakshmi S.
# 1 Practical modeling issues: Representing coded and structured patient data in EHR systems SHARP August 23, 2010 Stanley M Huff, MD Chief Medical Informatics.
Kaiser Permanente Standards Summit September 7-8, 2011 Stanley M. Huff, MD Huff # 1 We Are on the Right Path! SHARPn Face-to-Face June 11, 2012 Stanley.
Open Health Natural Language Processing Consortium (OHNLP)
June 11, 2012 Troy Bleeker. Agenda Participants will learn A cloud computing recap. What is our cloud like and why do we have it? Lab: VPN, IDs, shared.
Modeling Options HSPC Meeting June 17, 2015 Washington DC.
# 1 Clinical Data Normalization / Practical Modeling Issues Representing Coded & Structured Patient Data SHARPn Face to Face Meeting June 11, 2012 Stanley.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
LexBIG Release Overview Aug 21, LexBIG Context Project Goals for Sept –Incremental point release of LexBIG infrastructure to support EVS activities.
Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.
CTAKES The clinical Text Analysis and Knowledge Extraction System.
Key Issues of Interoperability in eHealth Asuman Dogac, Marco Eichelberg, Tuncay Namli, Ozgur Kilic, Gokce B. Laleci IST RIDE Project.
4 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Computer Software Chapter 4.
T Beale, Joey Coyle CIMI meeting Sep 2012 Copyright 2012 Ocean Informatics.
Networking and Health Information Exchange Unit 5b Health Data Interchange Standards.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
1 Incorporating Data Mining Applications into Clinical Guidelines Reza Sherafat Dr. Kamran Sartipi Department of Computing and Software McMaster University,
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Query Health Concept-to-Codes (C2C) SWG Meeting #11 February 28,
# 1 Clinical Element Models (CEMs) SHARP F2F Meeting Mayo Clinic June 21, 2010 Stanley M Huff, MD.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Health Level 7- Templates SIG By Peter Elkin, Mayo Clinic Martin Kernberg, UCSF Angelo Rossi-Mori, Italy.
MedKAT Medical Knowledge Analysis Tool December 2009.
Dictionary based interchanges for iSURF -An Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains David Webber.
7 Strategies for Extracting, Transforming, and Loading.
Preface IIntroduction Objectives I-2 Course Overview I-3 1Oracle Application Development Framework Objectives 1-2 J2EE Platform 1-3 Benefits of the J2EE.
IHE Workshop – June 2006What IHE Delivers 1 Nicholas Steblay Boston Scientific Implantable Device Cardiac Observations (IDCO) Profile.
LRI Validation Suite Meeting Prototype Tool Demonstration December 20th, 2011.
Part I: Introduction to SHARPn Normalization Hongfang Liu, PhD, Mayo Clinic Tom Oniki, PhD, Intermountain Healthcare.
CCD and CCR Executive Summary Jacob Reider, MD Medical Director, Allscripts.
1 Model Driven Health Tools Design and Implementation of CDA Templates Dave Carlson Contractor to CHIO
C-CDA Scorecard Rubrics Review of CDA R2.0 Smart C-CDA Scorecard Rules C. Beebe.
Systems Analysis and Design in a Changing World, Fourth Edition
11th HL7 Australia Conference, 13th December 2006
SHARP F2F Meeting Mayo Clinic June 21, 2010 Stanley M Huff, MD
Detailed Clinical Models WC3, March 29, 2007
Data Normalization Architecture
Electronic Health Information Systems
Data Element Definitions Pediatric SIG
Clinical Element Models
Presentation transcript:

Data Normalization Dr. Stan Huff

# 2 Acknowledgements Tom Oniki Joey Coyle Craig Parker Yan Heras Cessily Johnson Roberto Rocha Lee Min Lau Alan James Many, many, others…

# 3 What are detailed clinical models? Why do we need them?

# 4 A diagram of a simple clinical model data 138 mmHg quals SystolicBP SystolicBPObs data Right Arm BodyLocation data Sitting PatientPosition Clinical Element Model for Systolic Blood Pressure

# 5 Need for a standard model A stack of coded items is ambiguous (SNOMED CT) –Numbness of right arm and left leg Numbness ( ) Right ( ) Arm ( ) Left ( ) Leg ( ) –Numbness of left arm and right leg Numbness ( ) Left ( ) Arm ( ) Right ( ) Leg ( )

# 6 70 What if there is no model? Hct, manual: Site #1 % Hct : Site #2 Manual % Auto Hct, auto : % 35 Estimated

HL7 V2.X Messages Site 1: OBX|1|CE|4545-0^Hct, manual||37||%| OBX|1|CE|4544-3^Hct, auto||35||%| Site 2: OBX|1|CE| ^Hct||37||%|….|manual| OBX|1|CE| ^Hct||35||%|….|auto|

# 8 Too many ways to say the same thing A single name/code and value –Hct, manual is 37 % Two names/codes and values –Hct is 37 % Method is manual (spun)

# 9 Model fragment in XML Pre-coordinated representation Hct, manual (LOINC ) 37 % Post-coordinated (compositional) representation Hct (LOINC ) Method Manual 37 %

# 10 Isosemantic Models data 37 % HematocritManual (LOINC ) HematocritManualModel data 37 % quals Hematocrit (LOINC ) HematocritModel data Manual Hematocrit Method HematocritMethodModel Precoordinated Model Post coordinated Model (Storage Model)

# 11 Relational database implications If the patient’s hematocrit is <= 35 then …. Patient Identifier Date and TimeObservation TypeObservation Value Units /4/2005Hct, manual37% /19/2005Hct, auto35% Patient Identifier Date and TimeObservation Type Weight typeObservation Value Units /4/2005Hctmanual37% /19/2005Hctauto35%

# 12 More complicated items: Signs, symptoms Diagnoses Problem list Family History Use of negation – “No Family Hx of Cancer” Description of a heart murmur Description of breath sounds –“Rales in right and left upper lobes” –“Rales, rhonchi, and egophony in right lower lobe”

# 13 What do we model? All health care data, including: –Allergies –Problem lists –Laboratory results –Medication and diagnostic orders –Medication administration –Physical exam and clinical measurements –Signs, symptoms, diagnoses –Clinical documents –Procedures –Family history, medical history and review of symptoms

# 14 How are the models used? EMR: data entry screens, flow sheets, reports, ad hoc queries –Basis for application access to clinical data Data normalization –Creation of maps from models in the local system to the standard model Target for the output of structured data from NLP –Validation of data as it is stored in the database Phenotype algorithms (decision logic) –Basis for referencing data in phenotype definitions Does NOT dictate physical storage strategy

# 15 Model Source Expression (CDL) model BloodPressurePanel is panel { key code(BloodPressurePanel_KEY_ECID); statement SystolicBloodPressureMeas systolicBloodPressureMeas optional systolicBloodPressureMeas.methodDevice.conduct(methodDevice) systolicBloodPressureMeas.bodyLocationPrecoord.conduct(bodyLocationPrecoord) systolicBloodPressureMeas.bodyPosition.conduct(bodyPosition) systolicBloodPressureMeas.relativeTemporalContext.conduct(relativeTemporalContext) systolicBloodPressureMeas.subject.conduct(subject) systolicBloodPressureMeas.observed.conduct(observed) systolicBloodPressureMeas.reportedReceived.conduct(reportedReceived) systolicBloodPressureMeas.verified.conduct(verified); statement DiastolicBloodPressureMeas diastolicBloodPressureMeas optional …. statement MeanArterialPressureMeas meanArterialPressureMeas optional …. qualifier MethodDevice methodDevice optional; md.code.domain(BloodPressureMeasurementDevice_DOMAIN_ECID); qualifier BodyLocationPrecoord bodyLocationPrecoord optional; blp.code.domain(BloodPressureBodyLocationPrecoord_DOMAIN_ECID); modifier Subject subject optional; attribution Observed observed optional; attribution ReportedReceived reportedReceived optional; attribution Verified verified optional; }

# 16 Compiler CE Source File CE Translator “In Memory” Form HTML SMArt RDF? openEHR Archetype? HL7 RIM Static Models? Java Class XML Template -.xsd OWL? UML?

Artifacts Used CDL Model Definition CEM XML Schema HL7 Data Source CEM XML Instance

StandardLabObsQuantitative - CDL Definition import StandardLabObs; import ReferenceRangeNar; model StandardLabObsQuantitative is statement extends StandardLabObs { key domain(StandardLabObsQuantitative_KEY_VALUESET_ECID); data PQ primaryPQValue unit.domain (UnitsOfMeasure_VALUESET_ECID) alternate { match CD secondaryCDValue code.domain(LabValue_VALUESET_ECID); match CD altCDValue code.domain(LabValue_VALUESET_ECID); otherwise ST altSTValue; }; qualifier ReferenceRangeNar referenceRangeNar card(0..1); constraint primaryPQValue.isNullReasonCode.domain(LabNullFlavor_VALUESET_ECID); constraint abnormalInterpretation.CD.code.domain (AbnormalInterpretationNumericNom_VALUESET_ECID); constraint deltaFlag.CD.code.domain (DeltaFlagNumericNom_VALUESET_ECID); }

StandardLabObsQuantitative - Schema Snippet \

HL7 Source Instance MSH|^~\&|OADD|153|DADD|XNEPHA| ||ORU^R01| |T|2.2|||| EVN|R01| | PID|| | |007261|WHYLING^KAYLIE^O'TEST|| |F| |W|||(801) |(866) |||| | | PV1||O|XNEPHA^XNEPHA^^IM||||28826^Allyson^Josephine^ O'TEST |^||||||||||OP|||||||||||||||||||||||||| |||||||| ORC|RE||F506556|||||||||28826^Allyson^Josephine^ O'TEST ||||^| OBR||^|F506556^|HCT^HEMATOCRIT|R|| |||70011^ROSEN,A UBRY^ O'TEST ||| |^|28826^Allyson^Josephine^ O'TEST ||||M ||||C|F|RFP^RFP|^^^^^R|^~^~^||||||| OBX|1|NM|HCT^HEMATOCRIT|1.1|48|%|||R||F||| |IM^Perfor med at Inte|58528^ANDERSON^MARK|

LabObsQuantitative - XML Instance Snippet LOINC HCT equals % 48 … …

# 22 Issues Different groups use models differently –NLP versus EMR Structuring the models to meet more than one use Options for different granularities of models –Hematocrit model, model of pneumonia –Quantitative lab result model, x-ray finding Terminology integration – use of standards and terminology services Models for “rare” kinds of data –Medication being taken by a friend, not recommended by the physician

# 23 Questions?

Data Normalization Dr. Christopher Chute

IHC-Medication, Mayo, IHC LAB to CEM HL7 (Meds) HL7 (Meds) HL7 Initializer HL7 Initializer IHC-GCN TO- RXNORM Annotator IHC-GCN TO- RXNORM Annotator Drug CEM CAS Consumer Drug CEM CAS Consumer Mirth SharpDb HL7 (Labs) HL7 (Labs) HL7 Initializer HL7 Initializer Generic- LAB- Annotator Generic- LAB- Annotator LAB CEM CAS Consumer LAB CEM CAS Consumer Mayo LOINC resource Mayo LOINC resource IHC LOINC resource IHC LOINC resource IHC RXNORM resource IHC RXNORM resource

UIMA Normalization Pipeline  Convert HL7 V2.x Lab / Med Order Messages into CEM XML instances –Load SofA with HL7 message –Create Segment Objects in CAS –Normalize Segments in CAS –Transform Segments into CEM instances

Mayo, IHC LAB to CEM Mirth SharpDb HL7 (XML) HL7 (XML) HL7 Initializer HL7 Initializer LAB CEM CAS Consumer LAB CEM CAS Consumer Mayo LOINC resource Mayo LOINC resource IHC LOINC resource IHC LOINC resource One of the new pipelines created to normalize HL7 2.x Lab Messages into CEM instances. We pre-processed the HL7 messages converting from HL7 pipe syntax into HL7 XML format. Mirth HL7 Pipe Delimited HL7 Pipe Delimited Generic- LAB- Annotators Generic- LAB- Annotators Generic- LAB- Annotators Generic- LAB- Annotators

CAS (SOFA=HL7-XML) CAS (SOFA=HL7-XML) HL7 message CAS PID PV1 OBX CEM Initialize Parse Normalize Transform UIMA Pipeline Flow Mayo, IHC LAB to CEM

Normalization Anatomy Lab Annotators HL7 Segment Parser HL7 Segment Parser Date-Time To ISO Format Date-Time To ISO Format Syntactic Integrity LOINC lookups IHC codes to LOINC table Mayo codes to LOINC table LexGrid/CTS2 Terminology Services

Architectural Opportunities Mirth CAS To XML CAS To XML Mirth HL7 2.x CDA CEM format HL7 2.x Mayo CDA CEM format Time, Syntax Etc. Time, Syntax Etc. Semantic

Tactical Next Step Enhancements  Single CEM for multiple OBX segments  Efficiently utilize terminology services  Incorporate a library for HL7 clean-up routines  Increase scope of vocabulary standardization  Enhancements for the Drug Annotator –Context enhancement issue –Drug name surprises

Additional Vocabularies  Review sources used for normalization opportunities E.g. –In HL7 OBR Segments  Standardize Service ID (Codes) –In HL7 OBX Segments  Standardize Units  Standardize Reference Ranges  Standardize Normal Flags

Drug Name Disambiguity Real patient data, presented a unique case in drug names. “ToDAY” is brand name for: cephapirin sodium. This presents an interesting named entity disambiguation use case.

Where Persistence Fits In… 10 IHC (Backend CDR Systems) Mirth Connect IHC NwHIN Aurion Gateway SHARP NwHIN Aurion Gateway Mirth Connect UIMA Pipeline CEM Instance Database a Mayo EDT System

Persistence Channels  One Channel per model  Data stored as an XML Instance of the model  Fields extracted from XML to use as indices  XML Schema defined for each model  Stored using database transactions CEM ModelMirth Channel Administrative DiagnosisCemAdminDxToDatabase Standard Lab PanelCemLabToDatabase Ambulatory Medication Order CemMedicationToDatabase

General Channel Design Input Message Directory Channel CEM XML Instance Processed Message Directory Error Message Directory Persistence Store Connector

SharpDB a CEM Instance Database

Database Tables TablePurpose DemographicsPatient demographics (One row per patient) PatientCrossReferenceAssociates internal Patient ID with Site Patient ID (One row per cross map) SourceDataInformation about the original source data (One row per instance message) PatientDataCEM Instance XML with some source information (One row per instance message) IndexDataIndices into the XML instance. (Multiple rows per instance message AdminDx – One per message Lab – One per observation Medication – One per orderable item.)

Patient Demographics  Each message contains patient demographics  Demographics created on first received message based on site patient ID  Internal Patient ID is created and cross mapped to site patient ID  SharpDB is keyed off internally generated Patient ID

Running in a Cloud…  Various images were installed: –NwHIN Gateway provided by Aurion –MIRTH Connect our interface engine –UIMA Pipelines of various sorts –MySQL database for persistence –JBOSS / Drools rules engine All open source, running in a Ubuntu Cloud!

Node Controller Cloud ControllerWalrus Controller Cloud Server Node Server 1 SHARP Hardware Infrastructure Admin Client Interfa ce VPN / LAN Node Controller VM To Manage Cloud VM User VPN / LAN To Connect To Instances Persistence Storage Node Server 3 Node Controller VM Node Server 2 Node Controller VM Node Server 11 VM HardwareNo. of Physical Machines CPUMemoryDiskDisk SpaceNetworkingFunctionalityNo. of NICs Cloud Server1812 GB10000 RPM SAS1 TB1 Gbps Cloud, Walrus, Cluster and Storage Controller 4 Node Server1832 GB10000 RPM SAS1 TB1 GbpsNode Controller4 Node Server GB10000 RPM SAS600 GB/600 GB1 GbpsNode Controller4 Node Server1864 GB7200 RPM SATA1 TB/1 TB1 GbpsNode Controller4 Node Server1832 GB10000 RPM SAS4 TB1 GbpsNode Controller4 Build/Backup Server128 GB7200 RPM SATA2 TB1 GbpsBuild and Backup2 Storage RPM SAS7.5 TB1 GbpsPersistence and Image Storage Storage RPM SAS3.6 TB1 GbpsVolume Storage Cisco 48 Port Switch21 GB Image Storage … Private Switch Build/Backup Server Cluster ControllerStorage Controller

Data Normalization Summary  Initial “tracer shot” at Data Normalization –Cloud based processing using open source tools –Proof on concept, UIMA for Data Normalization –Move on to new problems / solutions… –Opportunities exist:  Add new annotators (modules) to the pipelines  Widen usage and scope of vocabulary services  Switch to real live flows and add HOSS clean up routines.  Various tweaks in NLP algorithms