Topics in Biomedical Informatics

Topics in Biomedical Informatics
Informatics for Integrating Biology and the Bedside (i2b2) Antonio Cusano Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-255 Storrs, CT Spring 2011

Overview Introduction to i2b2 Modeling the i2b2 Data Model
Overview of the i2b2 Software Tools Using the i2b2 Software Overview of the i2b2 Hive Cells Example Use Case Scenario Notable Projects & Usage in BMI Evaluating i2b2 Summary

Background & Motivation
The rise of Electronic Medical Record Systems (EMRS) holds great promise for clinical research Increasingly important for integration between medical record data and clinical research data But many challenges exist: EMRS are typically built with the “single patient” in mind It would be difficult to observe trends in data across combinations of many patients How do we “clean” EMR data at a global, enterprise-level without compromising the data? Removal of some data by person X could be a devastating loss to person Y How do we maintain patient privacy?

Background & Motivation
What do we need? A system that supports queries that cut across multiple patients More dependent on standard descriptors A system that can process and understand complex queries and specifications A system that can integrate medical record data and clinical research data Provide a robust data model A system that protects the privacy of the patients Solution?

Introducing i2b2 Informatics for Integrating Biology and the Bedside
One of seven NIH Roadmap National Centers for Biomedical Computing ( Funded under the NIH Common Fund Part of the networked national effort to build the infrastructure for biomedical computing in the nation Established in 2004 Based at Partners HealthCare in Boston, Massachusetts Non-profit, integrated health system founded by Brigham and Women’s Hospital and Mass. General Primary Investigator: Isaac Kohane, M.D., Ph.D., Professor of Pediatrics at Harvard Medical School

Mission Statement Overcome two major obstacles:
The computational challenges of discovery across large, heterogeneous data sets routinely obtained in clinical care The lack of knowledge of genomic-level physiology and how to study it Therefore, the goals of i2b2 are: To provide clinical researchers with the software tools necessary to collect and integrate medical record and clinical research data in the genomics age By creating a software suite that constructs and manages the modern clinical research chart

The Clinical Research Chart
i2b2 Software Tools The i2b2 Hive ver The Clinical Research Chart

i2b2 Software – Design Objectives
Design focused around several goals: Provide a secure presentation of patient information for research purposes Provide a software framework that can be easily extended Provide secure communication capabilities for said software framework Provide a flexible data model tuned to the needs of patient-specific information Requiring timely and scalable query performance Adaptable to new and unanticipated representations of health care information

Identifying the Data Model Requirements
Developers identified these key requirements for constructing a data model for i2b2 Integration of data from distributed and differently structured databases In order to perform comprehensive and integrative analyses Separation of data used for research from daily operational or transactional data Eliminate any performance implications and maintain integrity Standardization of a model across systems Ensure all i2b2 systems possess the same data model to enable data sharing Ease of use by end-users

Dimensional Modeling Model the database using two concepts: Facts
The quantitative or factual data being queried Dimensions Descriptions of the various facts

Star Schema Possesses a central “fact” table where each row represents a single fact A fact is an observation of a patient Diagnoses, Procedures, Genetic Data, Lab Data, Health History, Demographics Data, etc. An observation is not the same thing as an event Observations are recorded by a specific observer within a specific time range regarding a specific concept Fact table is surrounded by numerous dimension tables Four dimension tables Concept, Provider, Visit, Patient Contains descriptors that characterize the facts

Star Schema

Star Schema Performance
Enterprise repositories and project-specific, local repositories can contain very large amounts of data The size of the central fact table can grow to be very large as a result, impacting performance It is critical to have indexes on that table to maintain stable performance Use system-specific enhancements when possible SQL Server databases can use clustered indexes to any table to produce sorted results

i2b2 Software – Purpose Serves two primary use cases:
Expose an enterprise wide repurposing and distribution of medical record data for research Enable high performance collection of medical record data for querying and distribution Enable discovery within data on a wide scale Enable usage of medical record data in clinical studies How do we achieve these use cases? Use the i2b2 Software Tools! The i2b2 Hive The Clinical Research Chart A core component of the i2b2 Hive

What is the i2b2 Hive? A collection of interoperable services provided by i2b2 cells Each cell behaves as a functional service Cells are loosely coupled (independence) Cells do not know their relative locality (proximity) Cells are connected and communicate with each other using web services Can be invoked manually by the user Can be invoked automatically by the system workflow What do we notice? Highly modular architecture Highly scalable

What are i2b2 Cells? The i2b2 cell is the basic building block of the i2b2 environment An application “wrapped” into a functional unit Encapsulates business logic as well as access to data objects behind standard web service interfaces Supported services include REST, SOAP Communication using XML messages Business Logic HTTP XML Data Access REST SOAP Data Objects i2b2 web service interfaces

Structure of the XML Message
XML schema that defines: A header for communication management A header for the message request/response A message body that contains the data For example, can contain patient sets with their: Phenotypic (Clinical) and Genotypic Data References to other data objects (images, attachments)

Example XML Message Header

Example XML Message Body

Advantages of Web Services
Because all communication is in XML… Not limited to any single operating system Not limited to any single programming language Cells can be developed in Microsoft .NET, Perl, Python, Java, etc. Any language that supports REST or SOAP capability can be used Cells can exist on Windows, Linux, and Mac OS and communicate with each other i.e. cells residing on a Windows platform can talk with those on a UNIX platform No restriction on how simple or complex a cell can be XML tags the data REST/SOAP transfers the data

But Where’s the User Interface?
Web services do not provide a visual user interface The developer is required to build a client component Must include a Graphical User Interface (GUI) and Control Mechanism for user interaction Some considerations: Should utilize the web service interfaces for communication, rather than a home-brew approach Must ensure cell-to-cell communication is maintained Reuse the functionality of existing cells

How are Cells Classified?
The i2b2 Hive is composed of a number of cells with varying importance and functionality Core cells are essential for operation of the Hive Provide basic services Written in Java using Java J2EE specifications Front-end clients written using the Standard Widget Toolkit (SWT) Provides native OS look-and-feel for the user interfaces Optional and Plug-in type cells add functionality to the Hive but are not essential Special Hive Cells: The Clinical Research Chart The i2b2 Web Client The i2b2 Workbench Application

The Clinical Research Chart is the implementation of the Star Schema in i2b2 Functions as the integrated data repository for the i2b2 Hive Core cell of the i2b2 Hive (Data Repository Cell) Requires all core cells to gain complete functionality In fact, the main purpose of the other Core cells is to support the activities of the CRC Fundamentally built to store medical data Which can be accessed by any cell in the i2b2 Hive Similarly, any cell can contribute to placing data into the CRC

Useful for: Repurposing patient data and integrating it with genomic data and clinical trial data for clinical research Important to note: Not a mechanism for searching through hospital clinical systems Not a transaction system to manage clinical trials

The i2b2 Web Client Designed for enterprise related activities
i.e. selecting patients from an enterprise repository Written entirely in JavaScript, HTML, and CSS Uses AJAX to eliminate page refreshing Cross platform and compatible with most browsers Known compatibility issues with IE5 and lower Easy to deploy and update Important to note: Can create patient sets and retrieve patient counts Only anonymous patient data is shown Data is obfuscated by adding or subtracting a small random number to the available aggregate totals

The i2b2 Workbench Application
Designed for project-based use i.e. data manipulation, visual analytics Written in Java using the Eclipse Framework The client applications are Eclipse plug-ins which compose the workbench application Can be extended with other Java/Eclipse plug-ins More resource intensive than its web companion Helpful for heavy client-side processing

How to use the i2b2 Software
First, use the web or desktop client to select/query patients from the enterprise data repository (EDR)

Creating the Query Patient attributes are dragged from the “Terms” panels into the “Query Tool” panels Terms in the same panel are logically OR’d Terms in different panels are logically AND’d

How to use this Data? Querying from an EDR returns limited data
A patient count from the results of the query Aggregate counts of the demographics of these patients Not very useful for research purposes in current form In order to effectively use this data, patient sets must be saved into a new, project-specific database Will be saved in your local i2b2 installation This process is known as creating a “data mart” Requires IRB approval

Creation of a Data Mart A data mart ensures patient privacy by only storing information allowed under HIPAA regulations Public Health Information (PHI) is not included in the data mart Data is saved in the CRC (Star Schema DB Model)

Working with the Data Use the i2b2 Workbench Application to view & manipulate the data from your data mart

User & Hive Interaction
When using the web or desktop client, you’re not just accessing the Clinical Research Chart directly In fact, most interaction incorporates the functionalities of many i2b2 Cells At the minimum, all core cells are used in some way What do these other cells do? Project Management Data Repository (CRC) Ontology Management Identity Management File Repository Workflow Management

Workflow Framework Cell
This cell is used to process information in steps through various parts of the Hive Most processed information will come to reside in the CRC or be displayed to the user Specifically: Facilitates communication between cells Manages project-specific XML data objects for users of a given project These objects typically originate in other cells These objects are organized in hierarchical structures that represent relationships between elements Allows users to organize, label, and annotate data objects

Use Case Diagram

Operations and Descriptions

We can see the Workflow Management Cell at work in the i2b2 Web and Desktop Clients For example, providing hierarchal structure for concepts and patient sets

Project Management Cell
This cell is used to provide user authentication and manage group and role information User access is determined by a user’s role Defines what actions they may perform in the Hive Default role is User Other roles include Manager, Administrator Users can have one or more roles It also keeps track of what cells are part of the Hive and their location

Project Management Cell
Can be accessed by either an i2b2 client or by another i2b2 cell Client: user trying to login to client Cell: check which roles exist for user for that cell Authentication and Authorization Use Case Diagram:

File Repository Cell Fundamentally, this cell holds large files of data Radiological images, genetic sequences These files are generally referenced from the Clinical Research Chart Manages the sending and receiving of these files between cells Other cells will use REST or SOAP service calls to access files in this cell under most conditions Users can use this cell to upload files XML Request format: <message_body> <recvfile_request> <filename>/oasis/ABT001b/brain_324.jpg</filename> </recfile_request> </message_body>

Ontology Management Cell
Manages the terminology and knowledge information typically used in the Hive, especially in the CRC Provides descriptive terms and other information for data stored in the observation_fact table This metadata is stored in a separate table(s) outside of the Star Schema These vocabulary terms are organized in hierarchical structures (Workflow Framework) This information is either requested by or distributed to cells during most of the Hive’s transactions Use Case Diagram:

Ontology Management Cell
Typical Ontology Table Hierarchical level Full path that leads to the term Descriptive text value Is field a synonym for another term? Display icon used in the user interface Field not used in i2b2 Describes ontological concept Extra information about the concept in XML Column name in fact table that holds concept code Name of look-up table that holds concept code Name of field that holds concept path T for text or N for numeric SQL operator used in WHERE clause for queries Dimension table path that maps to the concept Store miscellaneous comments Tooltip that appears in the user interface Date the data was updated Date the data was downloaded Date the data was imported Coded value for the originating source system Coded value indicating term type: DOC or LAB

Identity Management Cell
Manages a patient's protected health information in a manner consistent with HIPAA privacy rules Patient data is available only as a HIPAA defined “Limited Data Set” Removal of patient identifiers Uses a “code book” that maps the real patient identifiers to arbitrary patient numbers in the CRC Design and Architecture documents are not publicly available for this cell It’s a secret?

Optional i2b2 Cells Natural Language Processing Cell
Manipulates text reports to extract specific terms and knowledge from them Extract concepts such as diagnoses, smoking status These concepts are then used to achieve various representations of the data Concepts returned divided into three categories: UMLS concepts Mapping parts of the document to concepts in the Unified Medical Language System (UMLS) database Regular Expression concepts Matching document text to a set of regular expression rules Smoking Status concepts Classification model trained on human-annotated smoking-related sentences

Natural Language Processing

Optional i2b2 Cells Pulmonary Function Test (PFT) Processing Cell
Parses a pulmonary function report and extracts embedded test values Report must be in a specific format Returned values may be stored in the CRC and used in queries or other types of analyses Report format not specified in any official i2b2 documentation, but examples have been published Provides some idea about the required format

Pulmonary Function Report Format

Example Use Case Scenario
Clinical Asthma Investigation Available data includes: Text notes from asthma clinic Reports from pulmonary function tests Questions… How and when is the data extracted? How and when is the data encrypted? How and when is the data collated into something meaningful and useful? Answer! Use the functionality provided by the i2b2 Hive Core cells and Optional cells Once data is gathered and processed, add this data to the Clinical Research Chart

Workflow Requirements
The Workflow Framework (WF) cell controls communication between the other cells Identify cells that will be needed for this workflow Identity Management, Data Repository, Natural Language Processing, and PFT Processing

Workflow Continued… The available data is uploaded through the Identity Management (IM) cell Names, medical record numbers, and other sensitive information are resolved and retained in the IM cell Data is encrypted (based on the block cipher Advanced Encryption Standard) Data is added to the Clinical Research Chart (CRC) The CRC now contains a HIPAA compliant, limited data set Encrypt Text Notes, PFT Reports

Workflow Continued… With our newly defined data set, we want to extract concepts from the text notes i.e. hospital discharge summaries, EMR data WF cell retrieves notes from the CRC and sends them to the Natural Language Processing cell (NLP) The NLP cell manipulates the notes and extracts specific information from them to form concepts These concepts are then pushed back to the CRC

Workflow Continued… Similarly, we want to extract concepts from the PFT reports WF cell retrieves the PFT reports from the CRC and sends them to the PFT Processing cell The PFT cell parses the records one by one and generates concepts from them The values associated with each test record are placed back into the CRC

Workflow Complete Data has now been fully processed and saved in the CRC and is available for viewing and manipulation Using the i2b2 Workbench Application Allows the investigator to query, analyze, and display the data What did we get from this process? Medication and diagnoses concepts related to asthma from the NLP notes Physical findings and physiological test results extracted from the PFTs Resulting in a wealth of valuable data for the clinical investigator to aid in clinical discovery

Crimson Project Developed by Dr. Lynn Bry of Partners HealthCare
Project Objectives: Provide enhanced sample management within i2b2 Support prospective and retrospective sample collection Prospective: requests typically routed to an external information system Retrospective: requests typically directed towards an existing repository or registry Three i2b2 cells Regulatory cell Sample Cohort Management cell Sample Registry cell

Crimson Project – The Cells
Regulatory Cell Manages the regulatory aspects associated with sample request and sample data management within i2b2 De-identification of data Connection management with external systems Storing PHI encryption keys Sample Cohort Management Cell Focused on translating, broadcasting, and tracking i2b2 sample requests Sample Registry Cell Manage the import process of sample data from external sources

Crimson Project – Architecture

SMArt Project for i2b2 Developed by Nich Wattanasin Project Objective:
Develop a common API for SMArt applications to interact with the i2b2 platform Project in the very early stages of development First release: September 14, 2010 Only 20 revisions since (as of April 2011) Current Capabilities: A handful of functions that return targeted information from a single patient record Accomplished via REST calls Results returned in RDF/XML format Plug-in for the i2b2 Web Client

SMArt Project – Current Functions
Get Medications Returns a list of medications for a specific patient record Get Demographics Returns the demographic information for a specific patient record Get Problems Returns a list of problems for a specific patient record Get Allergies Returns a list of allergies for a specific patient record GET id}/{medications | demographics | problems | allergies}/

SMArt Dashboard Web Client Plug-in
Ability to embed SMArt Apps directly into the i2b2 Web Client Ability to access i2b2 patient data via the SMArt connect model/project common API

i2b2 Research Data Warehouse
A custom i2b2 implementation at Cincinnati Children’s Hospital Medical Center ( Developed by the CCHMC i2b2 team Project adds several new capabilities to the i2b2 platform: Ability to view clinical data in a web-based form (similar to a chart review) Ability to enter data directly into i2b2 using forms i.e. data that is not collected from an EMR Ability to run reports and perform custom visualizations on the data CCHMC uses i2b2 to create a “research data warehouse” But what is a research data warehouse?

What is a Research Data Warehouse?
According to CCHMC… A research data warehouse is a repository that integrates information on patients from multiple sources Electronic health records Lab results Genetic and research data Birth registry data Government data (Medicaid) What it is used for: Cohort identification, hypothesis generation What it is NOT used for: Decision support, clinical trials, real-time alerts

i2b2 Research Data Warehouse

Evaluating i2b2 Performance Statistics provided by Partners Healthcare
Query Performance (on their primary i2b2 system) 4.6 million patient records 1.2 billion observations (facts) on these patients (observation_fact table) Queries requesting patient counts on this repository typically complete within 10 seconds, many within several milliseconds Data Mart Initialization Performance 2.6 million patient records 550 million observations (facts) on these patients 8x3 GHz processor machine with 32GB RAM Completed building in approximately 1 hour and 15 minutes

Evaluating i2b2 Scalability
Enabled by the modular nature of the i2b2 cell and ease of integration into the Hive Encourages development outside of the i2b2 core team Fosters rapid software development Usability Simple installation processes to get started Intuitive user interfaces Wealth of documentation publicly available online Reduced learning curve Interoperability Works on a variety of operating systems, web browsers, and server technologies Not limited to commercial technologies

Limitations Naturally, users can create project-level repositories (data marts) from an enterprise-level repository Can we update our project databases with fresh, updated enterprise data? Can we upload our project data, regardless of origin, into the enterprise repository? Such capabilities are not currently supported in i2b2 Difficult to implement the numerous policies required for these functions

Limitations i2b2 cells communicate through web services, which are not always flexible Perhaps we want to execute our own SQL queries? Not possible, queries are limited to pre-specified queries and result sets, dictated by the cells How do we overcome this? Developers planning to introduce a second SQL access layer to the CRC Will allow for greater flexibility with queries But will need to comply with security rules and strict ontology

Summary Presented i2b2 as a software tool and a data model aiding in clinical research and discovery Addresses the inherit challenges of integrating medical record and clinical research data Relatively young project, but on the fast track for growth and development Roadmap for future releases with a new version currently in release candidate (RC) status Adoption and usage in BMI looks promising Approximately 17 sites outside of Partners HealthCare are engaged in i2b2 projects

Thank You!

Topics in Biomedical Informatics

Similar presentations

Presentation on theme: "Topics in Biomedical Informatics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Topics in Biomedical Informatics

Similar presentations

Presentation on theme: "Topics in Biomedical Informatics"— Presentation transcript:

Similar presentations

About project

Feedback