I2b2 National Center for Biomedical Computing i2b2 Clinical Research Chart and Hive Architecture Henry Chueh Shawn Murphy Isaac Kohane, PI.

Slides:



Advertisements
Similar presentations
CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.
Advertisements

CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
“Service Framework” workgroup
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Resource wrappers, web services, grid services Jaspreet Singh School of Computer.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
G O B E Y O N D C O N V E N T I O N WORF: Developing DB2 UDB based Web Services on a Websphere Application Server Kris Van Thillo, ABIS Training & Consulting.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
SOA with Progress Philipp Walther Consultant. © 2007 Progress Software Corporation2 Agenda  SOA  Enterprise Service Bus (ESB)  The Progress SOA Portfolio.
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
BUSINESS INTELLIGENCE/DATA INTEGRATION/ETL/INTEGRATION AN INTRODUCTION Presented by: Gautam Sinha.
Managing Data Interoperability with FME Tony Kent Applications Engineer IMGS.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
SOA, BPM, BPEL, jBPM.
January, 23, 2006 Ilkay Altintas
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
PROJECT NAME: DHS Watch List Integration (WLI) Information Sharing Environment (ISE) MANAGER: Michael Borden PHONE: (703) extension 105.
Organizational Memory: Issues in Design & Implementation Sree Nilakanta May 1, 2000.
December 15, 2011 Use of Semantic Adapter in caCIS Architecture.
An Introduction to Software Architecture
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
Outline  Enterprise System Integration: Key for Business Success  Key Challenges to Enterprise System Integration  Service-Oriented Architecture (SOA)
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Interfacing Registry Systems December 2000.
Query Health Concept-to-Codes (C2C) SWG Meeting #12 March 6,
Chapter 9 Moving to Design
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Treatment Summary University of California San Francisco Center of Excellence for Breast Cancer Care PI: Laura J Esserman MD MBA; Edward Mahoney; Elly.
Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.
1 XML Based Networking Method for Connecting Distributed Anthropometric Databases 24 October 2006 Huaining Cheng Dr. Kathleen M. Robinette Human Effectiveness.
17 th October 2005CCP4 Database Meeting (York) CCP4(i)/BIOXHIT Database Project: Scope, Aims, Plans, Status and all that jazz Peter Briggs, Wanjuan Yang.
GEON Cyberinfrastructure Workshop Beijing, China, July 21-23, 2006 Workflow-Driven Ontologies for the Geosciences Leonardo Salayandía The University of.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Data Integration and Management A PDB Perspective.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
® IBM Software Group © 2004 IBM Corporation Developing an SOA with RUP and UML 2.0 Giles Davies.
Biomedical Informatics Research Network BIRN Workflow Portal.
Course: COMS-E6125 Professor: Gail E. Kaiser Student: Shanghao Li (sl2967)
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Needs and Progress: Summary Flexible, powerful, modular atlas interface, and a query gateway to multiple types of data (GeneNetwork, Barlow, Smith, CCDB,
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Biomedical Informatics Research Network BIRN Workflow Portal Shawn Murphy Michael Mendis.
2005 All Hands Meeting Data & Data Integration Working Group Summary.
ArrayExpress Ugis Sarkans EMBL - EBI
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Metadata Driven Clinical Data Integration – Integral to Clinical Analytics April 11, 2016 Kalyan Gopalakrishnan, Priya Shetty Intelent Inc. Sudeep Pattnaik,
Software Architecture Patterns (3) Service Oriented & Web Oriented Architecture source: microsoft.
By Jeremy Burdette & Daniel Gottlieb. It is an architecture It is not a technology May not fit all businesses “Service” doesn’t mean Web Service It is.
Biomedical Informatics Research Network BIRN Workflow Portal.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Enhancements to Galaxy for delivering on NIH Commons
Components.
Chris Menegay Sr. Consultant TECHSYS Business Solutions
Cross Platform Development using Software Matrix
Wsdl.
The Re3gistry software and the INSPIRE Registry
LOD reference architecture
Presentation transcript:

i2b2 National Center for Biomedical Computing i2b2 Clinical Research Chart and Hive Architecture Henry Chueh Shawn Murphy Isaac Kohane, PI

i2b2 National Center for Biomedical Computing Summary Background Intro to the Clinical Research Chart (CRC) Hive / Cell Software Architecture More details on establishing and using the CRC

i2b2 National Center for Biomedical Computing Background Clinical documentation is…clinical Lack of systematic approach for organizing clinical data for research Ownership issues are unique Consent issues are a challenge

i2b2 National Center for Biomedical Computing Driving Biological Projects Asthma Hypertension Huntington’s Disease Diabetes

i2b2 National Center for Biomedical Computing Clinical Research Chart (CRC) Organize and transform clinical data to maximize its utility for research Develop an Application and Database framework to serve this goal Establish an architecture that allows data from different studies done on this platform to be integrated

i2b2 National Center for Biomedical Computing Design of Clinical Research Chart OntologyConsent/TrackingApplication PoolManagement Services: Data flowing Custom Interfaces Soap/Http interfaces A program CRC DB HL7 MSH|^/&|736401….. PID|102| …. Text files XML.…. database clinical trials

i2b2 National Center for Biomedical Computing Design of Clinical Research Chart OntologyConsent/TrackingApplication PoolManagement Services: Data flowing Custom Interfaces Soap/Http interfaces A program Data pipeline/workflow applicationPheno/Genotype Database Visualization and Analysis of database contents CRC DB Text files XML.…. database clinical trials HL7 MSH|^/&|736401….. PID|102| ….

i2b2 National Center for Biomedical Computing i2b2 Skeletal Data Flow Shared data Study specific data Study specific data Clinical Research Chart Enterprise Systems Registration, ADT, Labs, Reports, Clinical Notes, etc Enterprise data source (RPDR) Enterprise data source (RPDR) Annotation UI EDC applications Local Systems Systems not gathered into Enterprise data warehouses i2b2 ETL workflow Annotation Service EDC Service Analytic workflow

i2b2 National Center for Biomedical Computing Overall Themes Framework to allow development of application services in a maximally decoupled fashion. Linux and Windows OS support Java and C++ programming languages Use Cases for construction of CRC come from Driving Biology Projects and experience with clients of Partners Research Patient Data Registry

i2b2 National Center for Biomedical Computing Focus on Workflow Necessary for both pre-CRC and post- CRC processes Needed for scientific flexibility Implies a consistent environment for data pipelining and flow control

i2b2 National Center for Biomedical Computing i2b2 Hive Formed as a collection of interoperable Cells, or services Loosely coupled Makes no assumptions about proximity Connected by Web services Activated by a workflow engine that forms basis of choreography among Cells for complex interactions

i2b2 National Center for Biomedical Computing Complex choreography

i2b2 National Center for Biomedical Computing i2b2 Cell Behaves as a functional service Separates interactions conceptually into transactions and semantics Focuses on facilitating transactions with simple semantics (e.g., datatype) Leaves deep semantics to be defined by the services provided by a Cell Does not restrict language implementation

i2b2 National Center for Biomedical Computing Target layer for i2b2 TCP/IP Web Services I2b2 platform Semantic Objects

i2b2 National Center for Biomedical Computing Cell examples Concept extraction from clinical narratives Simple transformations; e.g., basic text format conversion Complex encoding; e.g., encoding MIAME in MAGE Microarray data normalization …

i2b2 National Center for Biomedical Computing Exposing Cells Protocols layered on top of SOAP At the WSDL level for integrators; ie, bioinformaticians & software engineers At a functional level for investigators i2b2 toolkits to allow integrators to expose controlled functionality to investigators (Automator)

i2b2 National Center for Biomedical Computing Automator Approach investigators informaticians Extend Kepler workflow engine i2b2 Automator

i2b2 National Center for Biomedical Computing Bird’s eye view Workflow engine Investigator Portal CRC Repository

i2b2 National Center for Biomedical Computing Current Implementation Extending Kepler workflow engine for i2b2 Data model for CRC repository Defining protocols necessary for interaction (in addition to SOAP) Created Cell for concept extraction from narratives Early designs for Automator toolkit

i2b2 National Center for Biomedical Computing i2b2 Architecture Key Points Leverage existing workflow standards and software Use Web services as basic form of interaction Assume unlimited choreography, but… Provide tools to distill complexity into basic automation for clinical investigators

i2b2 National Center for Biomedical Computing SW Licensing and Distribution Commit to Open Source software Use GNU Lesser General Public License Establish local i2b2 repository exposed through i2b2 website Contribute to a more global NCBC SourceForge style repository if it emerges ?NIH Forge Keep i2b2 protocols fully open

i2b2 National Center for Biomedical Computing Interoperability across NCBC Strongly consider Web services as basic protocol for generic shared interactions Consider sharing datasets Promote diversity of approach and use of shared software (don’t impose uniformity) Facilitate/promote NCBC Open Source project teams

i2b2 National Center for Biomedical Computing Pre-CRC Data Pipeline/Workflow Populating the Clinical Research Chart (CRC)

i2b2 National Center for Biomedical Computing Pre-CRC Data Pipeline/Workflow Use workflow framework to choreograph applications services in specific sequences Used to extract, transform, conform, and load data and metadata into the CRC

i2b2 National Center for Biomedical Computing Pre-CRC Data Pipeline/Workflow OntologyConsent/TrackingApplication PoolManagement Services: Data flowing Custom Interfaces Soap/Http interfaces Output Input A program increasingly useful Local or through SOAP service

i2b2 National Center for Biomedical Computing Ontology Service OntologyConsent/TrackingApplication PoolManagement Manages mappings of terms to common vocabularies Provides lists of acceptable (enumerated) values for various attribute and value slots. Allows for management of hierarchies, groupings, and relationships between terms Ontology

i2b2 National Center for Biomedical Computing Person Consent/Tracking Service OntologyConsent/TrackingApplication PoolManagement Provides mappings between patient/subject identifiers Tracks patient/subject consent information Allows identification of the patient/subject based upon fuzzy demographic matches Consent/Tracking

i2b2 National Center for Biomedical Computing Application Pool (CVS) Service OntologyConsent/TrackingApplication PoolManagement Stores programs/scripts used in pipeline Provides applications to be downloaded when needed Manages versioning of software Provides documentation Application Pool

i2b2 National Center for Biomedical Computing Management Service OntologyConsent/TrackingApplication PoolManagement Stores workflow execution plan Starts and controls workflow execution Schedules workflow execution Monitors workflow execution and data locations Controls permissions associated with workflow execution Management

i2b2 National Center for Biomedical Computing Data Pipeline/Workflow Application Use Case for Asthma Data OntologyConsent/TrackingApplication PoolManagement Services: Data flowing Custom Interfaces Soap/Http interfaces OutputInput A program RPDR CRC DB AsthmaMart Data retrieval Data de-identification Language processing Vocabulary matching Load Data into Mart

i2b2 National Center for Biomedical Computing Data Pipeline/Workflow Implementation Define standard XML representation for workflow - MoMLDefine standard XML representation for workflow - MoML Define standards for SOAP services and resource discovery Adopt and extend open source workflow package (Kepler)Adopt and extend open source workflow package (Kepler) Prototypes by July timeframe BIRN -> NAMIC and LONI collaboration Can follow construction details at

i2b2 National Center for Biomedical Computing Phenotype/Genotype Database

i2b2 National Center for Biomedical Computing Phenotype/Genotype Database Principles Analytical database schema that does not need to change with new data types and concepts Defined fundamental unit of data (atomic fact) = observation Defined metadata strategy Various levels of de-identification (reviewed and approved by IRB)

i2b2 National Center for Biomedical Computing Phenotype/Genotype Database Architecture (see preprint)

i2b2 National Center for Biomedical Computing Phenotype/Genotype Database Use Case Smoking observations represented in database Patient_id_eConcept_cdStart_dateProvider_idConfidence_num Z234CT-A-SMK1/1/1997M Z234CT-A-SMK1/1/1998M Z234IC /1/2001M Z234CT-A-NSK1/1/2002M Patient_id_eBirth_dateSex_cdRace_cdDeath_date Z2343/4/1924FemaleBlack4/5/2003 Provider_idProvider_pathName_char M MGH\Neurology\M M Concept_cdConcept_pathName_char CT-A-SMKAsthV1\DRptNLP\Tobacco Use\SmokerSmoking IC V2\Diagnosis\Mental Disorders ( )\Non- psychotic disorders ( )\(305) Nondependent abuse of drugs\(305-1) Tobacco use disorder\( ) Tobacco use disorder, co~ Tobacco Use Disorder, continuous use CT-A-NSKAsthV1\DRptNLP\Tobacco Use\Non smokerNever smoked

i2b2 National Center for Biomedical Computing Phenotype/Genotype Database Implementation Asthma CRC DB “primed” with data from 90,000 patients from Research Patient Data Registry Serves as fundamental data structure for i2b2 supported data Querying and Visualization Application Suite CRC DB’s able to fuse seamlessly together Various levels of de-identification to be supported for data sharing and publication

i2b2 National Center for Biomedical Computing Visualization and Analysis of CRC database Post-CRC workflow

i2b2 National Center for Biomedical Computing Visualization and Analysis Principles Supported application suite to query and view CRC database contents Outside applications for analysis and viewing able to plug in to application suite Pipeline/Workflow framework may be used for analysis and re-entry of derived data into CRC database

i2b2 National Center for Biomedical Computing Visualization and Analysis Architecture Supported Applications, Querying and Visualization –Standard querying –Data exploration

i2b2 National Center for Biomedical Computing Visualization and Analysis Architecture Supported Applications, ontology management –Ontology Management Integrate (outside?) population analysis applications

i2b2 National Center for Biomedical Computing Visualization and Analysis Architecture Supported applications have plug-in architecture for outside analytic tools: –Standard web-link support with GET and POST oriented data transfer –Support transfer of specifically transformed data to outside applications –Complex analysis supported with workflow application

i2b2 National Center for Biomedical Computing Visualization and Analysis Architecture - Query

i2b2 National Center for Biomedical Computing Visualization and Analysis Architecture - Exploration

i2b2 National Center for Biomedical Computing Visualization and Analysis Architecture – Ontology mgmt

i2b2 National Center for Biomedical Computing Visualization and Analysis Use Case

i2b2 National Center for Biomedical Computing Visualization and Analysis Implementation of analysis tools Workflow framework to accommodate external analytic applications CRC DB ProgID CA2.3 SN8745 PA5683 SN8745 SNOMED CODE patient id account # 347 subject id 4 ProgID CX2.3 ProgID PN5.1ProgID TH3.0 ProgID SN5.4 ProgID AA3.3 ProgID CN2.3ProgID XN0.9

i2b2 National Center for Biomedical Computing Final Assembly statistics application server statistics application server Gene expression in APOE  4 Allele Alzheimer's Seizures ER visits Clinic visits Outcomes calculated every week Surgery ER visit microarray (encrypted) ownership manager encryption Trauma Gene-Chips population registry database microarray (encrypted) Trauma Surgery Multiple sclerosis Trauma CT Scan Hemorrhage Thalamus person conceptdate Gene-Chips Seizure Alzheimer’s Diabetes Z5937X Z5956X Z5937X raw value 3/4 3/9 5/2 4/6