Download presentation
Presentation is loading. Please wait.
Published byDuane Benson Modified over 9 years ago
1
Uniting i2b2.org and caGrid National scale data sharing networks for Biomedical Informatics research Rob Wynden – UCSF A collaborative effort of UCSF, OSU, UCD, Rochester U, UPenn, U Washington, Wash U, and Partner’s Health
2
Challenges Several challenges impede the task of launching an IDR (integrated data repository) and sharing that information for research purposes –Data Governance and Standardization –Meeting the needs of researchers –Semantic Interoperability
3
Data Governance It is very difficult to get approval to import data into an IDR installation If we were also to require that data be encoded at the source in a particular standard format then approval would be even more difficult Data translation during ETL (extract transform and load) is also hard because not all data needs to be so encoded and data must often be translated into multiple standard formats
4
Meeting the needs of Researchers Researchers need data to be encoded in the format which is appropriate for their research specialty. No single data encoding is appropriate for all purposes Researchers will also require access to the source information in un-modified form for verification purposes
5
Semantic Interoperability In order for researchers within the same domain of study to share information and work together that information must be encoded in a consistent format Each research institution has information encoded in a unique fashion which is dependent on a particular mix of the source software environments used in clinical, clinical research and bench science.
6
Ontology Mapper The Ontology Mapper Maps local data (which is usually not formally encoded) into formally encoded based on ISO/IEC 111-79 data models which have been checked into the caDSR (Data Standards Repository). (It is an Instance Mapper.) XML based instance map definitions can be shared between institutions both under Creative Commons License or under a Commercial License after purchase.
7
Benefits of i2b2 An open source translational informatics warehouse platform (an IDR) An active open source based user community Industry support (Sybase, HP, Sun …) A relatively easy platform into which to import source data regardless of it’s encoding Availability of a general purpose instance mapper for the translation of source data into standard encodings
8
Problems with i2b2 related to data sharing I2b2 lacks a mature data sharing capability which includes both general purpose semantic interoperability and security I2b2 cannot interoperate with other IDR’s which may not be on the same platform
9
Benefits of caGRID Developed as part of the caBIG translational informatics effort caGRID is a mature data sharing network caGRID offers secure user authentication caGRID offers data sharing over a semantically interoperable network caGRID is platform agnostic and can be used to interconnect IDR environments regardless of the underlying technology (the design of caGRID is NOT specific to caBIG related systems) caGRID will eventually interoperate with Science Commons for accessing legal data access agreements
10
Problems with caGRID It is currently difficult to use caGRID on IDR projects. The caBIG project does not currently offer a general purpose IDR software environment It is currently difficult to translate data into a format suitable for publication over caGRID All caGRID based systems require that shared data be encoded within standard format(s) which usually does not match the format of our data sources.
11
The best of both worlds By combining the advantages of i2b2.org and caGRID we will provide a comprehensive solution to national scale data sharing I2b2.org provides a relatively easy way of importing source data and translating that information into a standard format(s) caGRID supplies a secure and semantically interoperable national scale network.
12
CTSA Collaborative Development The effort to combine i2b2.org with caGRID is a collaborative effort involving several CTSA sites I2b2.org was first launched into open source by Partner’s Health and includes many CTSA award sites including, Harvard Med, UCSF, UCD, U Washington, Cincinnati Children’s, UT Houston, Rochester, UPenn etc, etc…
13
Ontology Mapper Cell The Ontology Mapper Cell within i2b2 is a general purpose instance mapper which can translate messy local data into one or more standard formats. In other words, the Ontology Mapper maps local data into Ontologies Maps will be created and annotated in a Protégé Prompt plug-in and can be shared over HL7 CTS II both as open source or as commercially sold assets Maps contain routing, provenance information and a scriptlet payload of SQL, Perl, SparQL, Horn or R The Ontology Mapper Cell within i2b2 is a collaborative effort involving UCSF, UCD, Rochester, UPenn, and U Washington This has been a highly active collaborative effort which is now in an Alpha release cycle
14
caGRID Cell The caGRID Cell is a development project which is a collaboration of OSU (Ohio State) and UCSF This component allows any i2b2 data mart, which has been translated into standard format by the Ontology Mapper, to share data over caGRID This system will allow i2b2 to share data (a federated query) across any caGRID based data source (not just between other i2b2 instances)
15
Query Interfaces caGRID based query: Work is under way to create a caGRID based query interface for the HSDB project (Wash U) I2b2 based query: This environment will be implemented as a plug-in for the i2b2 SHRINE environment
17
Five pilot projects under way There are currently FIVE data sharing projects which have all based their architectures on this work HSDB (Human Studies Database – Ida Sim) – The project for which this i2b2-caGRID architecture was first developed shares clinical research metadata – UCSF, Mayo Clinic, Wash U, UTSW, UCD QSN (The Quality Safety Network – Andy Auerbach) – A national network of payer, and IDR derived quality data - UCSF, Tufts, Northwestern, Kaiser, Michigan and 17 Payers STIRS (Cardiovascular Imaging Research Grid - Max Wintermark) : UCSF, GeorgeTown, UCLA, Sutter Health Corp CHORI (Collab for Oral Health-Related Informatics - Joel White) : UCSF, Harvard, UT Houston DBRD (Distributed Biobank for Rare Diseases - Jennifer Puck) : UCSF, UT Southwestern, Emory, Duke Total number of unique sites: 37 Number of sites already involved with the CTSA: 20 (almost all of these sites are heavily involved with at least one of these grid projects)
18
So how does it work? STEP 1 –First data is ETL’ed (extract transform load) into the i2b2 schema –The i2b2 schema is based on Concept Table design which is a derivative of fact table design. –In concept table design each ‘name’ in the fact table is a hierarchical string of concepts –This architecture can be used to import (ETL) source data in any encoding without the requirement for data standardization as a data governance task
19
Concept Table Design
20
So how does it work? STEP 2 –As data is imported it is then translated into one or more standard formats with the Ontology Mapper Cell. –The Ontology Mapper uses HL7 CTSII shareable data translation rules to translate local data into standard format(s). (it’s a general purpose instance mapper). –One-to-one maps, aggregates and archetype generation are all supported. –The Ontology Mapper then publishes data into a data mart. Ontology Mapper data marts are database Views which can be ‘materialized’ into physical data marts if required.
22
So how does it work? STEP 3 –The Ontology Mapper translates data into an IEC11179 compliant data model –The Ontology Mapper Cell then publishes that data as a data mart (a View within the underlying database) with permission within i2b2 aligned with the study protocol –Each data model is checked into the caDSR (data standards repository) to serve as a common standard reference –The caGRID Cell then provides a grid data service which automatically provides the necessary EAV to object relational transform in order for i2b2 based data to be interoperable over the caGRID (created based on the Introduce tool) –Data can then be queried via standard caGRID tools or via custom caGRID query environments if required (permissions are handled via Grid Grouper) –Queries can be both intra and inter institutional
24
Combining i2b2 and caGRID By combining these techniques we can achieve the goal of a national scale semantically interoperable data sharing network within the CTSA This is a national collaborative effort involving many CTSA and caBIG based sites around the country By all working together as a team we are better equipped to achieve our goals of launching IDR’s and sharing research information.
25
Thank you Questions please A collaborative effort of UCSF, OSU, UCD, Rochester U, UPenn, U Washington, Wash U, Partner’s Health and many others. If you are interested in becoming a contributing member to this effort please contact rob.wynden@ucsf.edu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.