Uniting i2b2.org and caGrid National scale data sharing networks for Biomedical Informatics research Rob Wynden – UCSF A collaborative effort of UCSF,

Slides:



Advertisements
Similar presentations
ATML Readiness For Use Phase II. Phase II Readiness For Use The ATML: Phase II will build on the Core phases, adding additional ATML components and features.
Advertisements

EMRLD A RIM-based Data Integration Approach Pradeep Chowdhury Manager, Data Integration.
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
Consistent and standardized common model to support large-scale vocabulary use and adoption Robust, scalable, and common API to reduce variation in clinical.
Multi-Mode Survey Management An Approach to Addressing its Challenges
Data Manager Business Intelligence Solutions. Data Mart and Data Warehouse Data Warehouse Architecture Dimensional Data Structure Extract, transform and.
I2b2 grid integration with Health Ontology Mapper CTSA Informatics All Hands Meeting October 24, 2009 Rob Wynden (UCSF)
Looking ahead: caGrid community requirements in the context of caGrid 2.0 Lawrence Brem 7 February 2011.
© Copyright 2008, Mayo Clinic College of Medicine Mayo Clinic Open Health Tools Application for Membership OHT Board Meeting, Birmingham, UK July 1, 2008.
I2B2 Users Group There are many institutions beginning to collaborate on this google group: U. Mass, UTMC-H, UCSF,
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
Information and Business Work
Overview of Biomedical Informatics Rakesh Nagarajan.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
I2b2 grid integration with Ontology Mapper
Image Query (IQ) Project Update Building queries one question mark at a time March, 2009.
1 Data Strategy Overview Keith Wilson Session 15.
CaGrid, Fog and Clouds Joel Saltz MD, PhD Director Center for Comprehensive Informatics.
A Robust Health Data Infrastructure P. Jon White, MD Director, Health IT Agency for Healthcare Research and Quality
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
IDR Snapshot: Quantitative Assessment Methodology Evaluating Size and Comprehensiveness of an Integrated Data Repository Vojtech Huser, MD, PhD a James.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
February Semantion Privately owned, founded in 2000 First commercial implementation of OASIS ebXML Registry and Repository.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
The Health Ontology Mapper (HOM) Method Clinical & Translational Science Ontology Workshop (NCBO/CTSA) April 24, 2012 Rob Wynden - Chief Scientist, Ketty.
Limited Distribution Release Open Information Interoperability Tool Suite Dr. Len Seligman, Dr. Ken Smith, Catherine Macheret, Chris Wolf
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
EXPECTATIONS OF TURKISH ENVIRONMENTAL SECTOR FROM INSPIRE Ministry of Environment and Forestry June, 2010 Özlem ESENGİN Ahmet ÇİVİ Tuncay DEMİR.
The Integrated Data Repository (IDR): Ontology Mapping and Data Discovery for the Translational Investigator 1 Rob Wynden, BSCS, 1 Russ J. Cucina, MD,
XIP: The eXtensible Imaging Platform Development Program Fred Prior, Ph.D. Mallinckrodt Institute of Radiology Washington University in St. Louis.
Data Warehousing at STC MSIS 2007 Geneva, May 8-10, 2007 Karen Doherty Director General Informatics Branch Statistics Canada.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
Health Ontology Mapper A project initiated within the CTSA (Clinical Translation Science Awards) program Goal: create a semantic interoperability layer.
LexEVS Overview Mayo Clinic Rochester, Minnesota June 2009.
Save time. Reduce costs. Find and reuse interoperability solutions on Joinup for developing European public services Nikolaos Loutas
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA Jet Propulsion Laboratory Sean Kelly NASA Jet Propulsion.
10/18/20151 Business Process Management and Semantic Technologies B. Ramamurthy.
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
H Using the Open Metadata Registry (OpenMDR) to generate semantically annotated grid services Rakesh Dhaval, MS, Calixto Melean,
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.
Storing Organizational Information - Databases
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Health Ontology Mapper NCBO BioPortal Integration 2010 i2b2.org Academic User’s Group Oct Rob Wynden (UCSF)
CaGrid Overview and Core Services caGrid Knowledge Center February 2011.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
7 Strategies for Extracting, Transforming, and Loading.
Needs and Progress: Summary Flexible, powerful, modular atlas interface, and a query gateway to multiple types of data (GeneNetwork, Barlow, Smith, CCDB,
In Vivo Imaging Middleware and Applications RSNA 2007 Berkant Barla Cambazoglu The Ohio State University Department of Biomedical Informatics.
Patterns in caBIG Baris E. Suzek 12/21/2009. What is a Pattern? Design pattern “A general reusable solution to a commonly occurring problem in software.
SEMIC.EU Semantic Interoperability Centre Europe Open Days Workshop eGovernment for Regions Aldo Laudi 7th October 2008.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
CIMA and Semantic Interoperability for Networked Instruments and Sensors Donald F. (Rick) McMullen Pervasive Technology Labs at Indiana University
ResearchEHR Use of semantic web technologies and archetypes for the description of EHRs Montserrat Robles, Jesualdo Tomás Fernández-Breis, Jose Alberto.
SysML v2 Model Interoperability & Standard API Requirements Axel Reichwein Consultant, Koneksys December 10, 2015.
Unit 5 Systems Integration and Interoperability
An ecosystem of contributions
Metadata Construction in Collaborative Research Networks
Business Intelligence
Business Process Management and Semantic Technologies
Presentation transcript:

Uniting i2b2.org and caGrid National scale data sharing networks for Biomedical Informatics research Rob Wynden – UCSF A collaborative effort of UCSF, OSU, UCD, Rochester U, UPenn, U Washington, Wash U, and Partner’s Health

Challenges Several challenges impede the task of launching an IDR (integrated data repository) and sharing that information for research purposes –Data Governance and Standardization –Meeting the needs of researchers –Semantic Interoperability

Data Governance It is very difficult to get approval to import data into an IDR installation If we were also to require that data be encoded at the source in a particular standard format then approval would be even more difficult Data translation during ETL (extract transform and load) is also hard because not all data needs to be so encoded and data must often be translated into multiple standard formats

Meeting the needs of Researchers Researchers need data to be encoded in the format which is appropriate for their research specialty. No single data encoding is appropriate for all purposes Researchers will also require access to the source information in un-modified form for verification purposes

Semantic Interoperability In order for researchers within the same domain of study to share information and work together that information must be encoded in a consistent format Each research institution has information encoded in a unique fashion which is dependent on a particular mix of the source software environments used in clinical, clinical research and bench science.

Ontology Mapper The Ontology Mapper Maps local data (which is usually not formally encoded) into formally encoded based on ISO/IEC data models which have been checked into the caDSR (Data Standards Repository). (It is an Instance Mapper.) XML based instance map definitions can be shared between institutions both under Creative Commons License or under a Commercial License after purchase.

Benefits of i2b2 An open source translational informatics warehouse platform (an IDR) An active open source based user community Industry support (Sybase, HP, Sun …) A relatively easy platform into which to import source data regardless of it’s encoding Availability of a general purpose instance mapper for the translation of source data into standard encodings

Problems with i2b2 related to data sharing I2b2 lacks a mature data sharing capability which includes both general purpose semantic interoperability and security I2b2 cannot interoperate with other IDR’s which may not be on the same platform

Benefits of caGRID Developed as part of the caBIG translational informatics effort caGRID is a mature data sharing network caGRID offers secure user authentication caGRID offers data sharing over a semantically interoperable network caGRID is platform agnostic and can be used to interconnect IDR environments regardless of the underlying technology (the design of caGRID is NOT specific to caBIG related systems) caGRID will eventually interoperate with Science Commons for accessing legal data access agreements

Problems with caGRID It is currently difficult to use caGRID on IDR projects. The caBIG project does not currently offer a general purpose IDR software environment It is currently difficult to translate data into a format suitable for publication over caGRID All caGRID based systems require that shared data be encoded within standard format(s) which usually does not match the format of our data sources.

The best of both worlds By combining the advantages of i2b2.org and caGRID we will provide a comprehensive solution to national scale data sharing I2b2.org provides a relatively easy way of importing source data and translating that information into a standard format(s) caGRID supplies a secure and semantically interoperable national scale network.

CTSA Collaborative Development The effort to combine i2b2.org with caGRID is a collaborative effort involving several CTSA sites I2b2.org was first launched into open source by Partner’s Health and includes many CTSA award sites including, Harvard Med, UCSF, UCD, U Washington, Cincinnati Children’s, UT Houston, Rochester, UPenn etc, etc…

Ontology Mapper Cell The Ontology Mapper Cell within i2b2 is a general purpose instance mapper which can translate messy local data into one or more standard formats. In other words, the Ontology Mapper maps local data into Ontologies Maps will be created and annotated in a Protégé Prompt plug-in and can be shared over HL7 CTS II both as open source or as commercially sold assets Maps contain routing, provenance information and a scriptlet payload of SQL, Perl, SparQL, Horn or R The Ontology Mapper Cell within i2b2 is a collaborative effort involving UCSF, UCD, Rochester, UPenn, and U Washington This has been a highly active collaborative effort which is now in an Alpha release cycle

caGRID Cell The caGRID Cell is a development project which is a collaboration of OSU (Ohio State) and UCSF This component allows any i2b2 data mart, which has been translated into standard format by the Ontology Mapper, to share data over caGRID This system will allow i2b2 to share data (a federated query) across any caGRID based data source (not just between other i2b2 instances)

Query Interfaces caGRID based query: Work is under way to create a caGRID based query interface for the HSDB project (Wash U) I2b2 based query: This environment will be implemented as a plug-in for the i2b2 SHRINE environment

Five pilot projects under way There are currently FIVE data sharing projects which have all based their architectures on this work HSDB (Human Studies Database – Ida Sim) – The project for which this i2b2-caGRID architecture was first developed shares clinical research metadata – UCSF, Mayo Clinic, Wash U, UTSW, UCD QSN (The Quality Safety Network – Andy Auerbach) – A national network of payer, and IDR derived quality data - UCSF, Tufts, Northwestern, Kaiser, Michigan and 17 Payers STIRS (Cardiovascular Imaging Research Grid - Max Wintermark) : UCSF, GeorgeTown, UCLA, Sutter Health Corp CHORI (Collab for Oral Health-Related Informatics - Joel White) : UCSF, Harvard, UT Houston DBRD (Distributed Biobank for Rare Diseases - Jennifer Puck) : UCSF, UT Southwestern, Emory, Duke Total number of unique sites: 37 Number of sites already involved with the CTSA: 20 (almost all of these sites are heavily involved with at least one of these grid projects)

So how does it work? STEP 1 –First data is ETL’ed (extract transform load) into the i2b2 schema –The i2b2 schema is based on Concept Table design which is a derivative of fact table design. –In concept table design each ‘name’ in the fact table is a hierarchical string of concepts –This architecture can be used to import (ETL) source data in any encoding without the requirement for data standardization as a data governance task

Concept Table Design

So how does it work? STEP 2 –As data is imported it is then translated into one or more standard formats with the Ontology Mapper Cell. –The Ontology Mapper uses HL7 CTSII shareable data translation rules to translate local data into standard format(s). (it’s a general purpose instance mapper). –One-to-one maps, aggregates and archetype generation are all supported. –The Ontology Mapper then publishes data into a data mart. Ontology Mapper data marts are database Views which can be ‘materialized’ into physical data marts if required.

So how does it work? STEP 3 –The Ontology Mapper translates data into an IEC11179 compliant data model –The Ontology Mapper Cell then publishes that data as a data mart (a View within the underlying database) with permission within i2b2 aligned with the study protocol –Each data model is checked into the caDSR (data standards repository) to serve as a common standard reference –The caGRID Cell then provides a grid data service which automatically provides the necessary EAV to object relational transform in order for i2b2 based data to be interoperable over the caGRID (created based on the Introduce tool) –Data can then be queried via standard caGRID tools or via custom caGRID query environments if required (permissions are handled via Grid Grouper) –Queries can be both intra and inter institutional

Combining i2b2 and caGRID By combining these techniques we can achieve the goal of a national scale semantically interoperable data sharing network within the CTSA This is a national collaborative effort involving many CTSA and caBIG based sites around the country By all working together as a team we are better equipped to achieve our goals of launching IDR’s and sharing research information.

Thank you Questions please A collaborative effort of UCSF, OSU, UCD, Rochester U, UPenn, U Washington, Wash U, Partner’s Health and many others. If you are interested in becoming a contributing member to this effort please contact