National Cancer Institute U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health CDR, an NCI-developed Vocabulary and Database for.

Slides:



Advertisements
Similar presentations
Medical Image Resource Center. What is MIRC? Medical Image Resource Center Makes it easier to locate and share electronic medical images and related information.
Advertisements

EDRN’s Validation Study Information Management System Developed for EDRN by the DMCC Cancer Biomarkers Group Division of Cancer Prevention Jet Propulsion.
© Copyright 2008, Mayo Clinic College of Medicine Mayo Clinic Open Health Tools Application for Membership OHT Board Meeting, Birmingham, UK July 1, 2008.
Massachusetts: Transforming the Healthcare Economy John D. Halamka MD CIO, Harvard Medical School and Beth Israel Deaconess Medical Center.
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
Biospecimens Carolyn Compton, MD, PhD James Robb, MD Office of Biorepositories and Biospecimen Research (OBBR), NCI June 25, 2007.
Aug. 20, JPL, SoCalBSI '091 The power of bioinformatics tools in cancer research Early Detection Research Network, JPL Mentors: Dr. Chris Mattmann,
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
Overview of Biomedical Informatics Rakesh Nagarajan.
EleMAP: An Online Tool for Harmonizing Data Elements using Standardized Metadata Registries and Biomedical Vocabularies Jyotishman Pathak, PhD 1 Janey.
Ontology Based Clinical Trial Builder Norbert Graf, Fatima Schera, Gabriele Weiler University of the Saarland, Fraunhofer Institute, Germany Clinical Trial.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
CUMC IRB Investigator Meeting November 9, 2004 Research Use of Stored Data and Tissues.
The NIH Biomedical Translational Research Information System (BTRIS) Town Hall Meeting - Information Session February 26, 2008 Lipsett Auditorium.
A Primer on Healthcare Information Exchange John D. Halamka MD CIO, Harvard Medical School and Beth Israel Deaconess Medical Center.
What is “Biomedical Informatics”?. Biomedical Informatics Biomedical informatics (BMI) is the interdisciplinary field that studies and pursues.
Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
Biorepository Software Selection University of Michigan 31-Aug-2012 Frank Manion, Chief Information Officer Paul McGhee, Lead Business Analyst Cancer Center.
December 2006 MAGE and the Biospecimen Research Database Experiment Design and other issues Ian Fore, D.Phil U.S. National Cancer Institute - Center for.
Our Joint Playing Field: A Few Constants Change Change Our missions (if defined properly) Our missions (if defined properly) Importance of Community Engagement.
1 Matthew J. McAuliffe, Ph.D., Chief, Biomedical Imaging Research Services Section (BIRSS) CIT Ramona Hicks, Ph.D., Program Director, Repair and Plasticity.
LexEVS Overview Mayo Clinic Rochester, Minnesota June 2009.
Cancer Clinical Trial Suite (CCTS): An Introduction for Users A Tool Demonstration from caBIG™ Bill Dyer (NCI/Pyramed Research) June 2008.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Lisa A. Lang, MPP Assistant Director for Health Services Research Information Head, National Information Center on Health Services Research and Health.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
Chapter 6 – Data Handling and EPR. Electronic Health Record Systems: Government Initiatives and Public/Private Partnerships EHR is systematic collection.
1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA Jet Propulsion Laboratory Sean Kelly NASA Jet Propulsion.
Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.
A School of Information Science, Federal University of Minas Gerais, Brazil b Medical University of Graz, Austria, c University Medical Center Freiburg,
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Data Integration and Management A PDB Perspective.
Clinical Collaboration Platform Overview ST Electronics (Training & Simulation Systems) 8 September 2009 Research Enablers  Consulting  Open Standards.
Clinical Research Informatics at the University of Michigan Daniel Clauw M.D. Professor of Medicine, Division of Rheumatology Assistant Dean for Clinical.
1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo
NCCCP Biospecimens (BIOSPEC) Breakout Session Carolyn Compton, MD, PhD James Robb, MD Office of Biorepositories and Biospecimen Research (OBBR), NCI June.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer.
May 2007 CTMS / Imaging Interoperability Scenarios March 2009.
Welcome to the caBIG Community! The cancer Biomedical Informatics Grid (caBIG ® ) offers more than 120 open source tools, technologies and infrastructure.
Chang, Wen-Hsi Division Director National Archives Administration, 2011/3/18/16:15-17: TELDAP International Conference.
CaTissue Suite 1.2 TPBT Face to Face Michelle Lee, MBA, Ph.D. Ian Fore, D. Phil. December, 2009.
Biomedical Informatics and Health. What is “Biomedical Informatics”?
Data Coordinating Center University of Washington Department of Biostatistics Elizabeth Brown, ScD Siiri Bennett, MD.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Semantic Interoperability: caCORE and the Cancer Data Standards Repository (caDSR)  Jennifer Brush.
Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. McGraw-Hill/Irwin Chapter 2 Clinical Information Standards – Unit 3 seminar Electronic.
C3PR: An Introduction for Users A Tool Demonstration from caBIG™ Vijaya Chadaram Duke Cancer Center April 29, 2008.
Global Specimen Identifiers TBPT Workspace Call 19 July 2010.
Towards a unified MOD resource: An Overview
Building Enterprise Applications Using Visual Studio®
To develop the scientific evidence base that will lessen the burden of cancer in the United States and around the world. NCI Mission Key message:
Semantic Web - caBIG Abstract: 21st century biomedical research is driven by massive amounts of data: automated technologies generate hundreds of.
The UMLS and the Semantic Web
The National Center for Biomedical Ontology
Patient Biospecimen Preanalytics:
Solutions to Clinical Data Visualization and Analysis
Joslynn Lee – Data Science Educator
Unit 5 Systems Integration and Interoperability
Development of A Digital Pathology Database for Annotation and Quality Management of a Brain Tumor Biorepository Jeremy Molligan, MD Pathology.
What is “Biomedical Informatics”?
Carolina Mendoza-Puccini, MD
What is “Biomedical Informatics”?
The National Center for Biomedical Ontology
UAB Tissue Biorepository
Presentation transcript:

National Cancer Institute U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health CDR, an NCI-developed Vocabulary and Database for Biobanking Helen Moore and Ping Guan Biorepositories and Biospecimen Research Branch CTS Ontology Workshop 2015 Sept. 23, 2015

Improving Biospecimen Processes is Essential to Enable Better Research, Clinical Trials, and Molecular Medicine Biospecimen Collection (Blood, Tissues, Urine, etc.) Processing in Pathology Lab Patient Care Clinical Trials Research StorageAnalysis Adapted from Peggy Devine Clinical Data Collection

Biorepositories and Biospecimen Research Branch (BBRB) Mission –BBRB provides leadership, tools, resources, and policies in biobanking for the global biomedical research community, to enable translational research and precision medicine for patients.

Current Initiatives NCI Best Practices for Biospecimen Resources Biospecimen Preanalytical Variables Program (BPV) Genotype-Tissue Expression Program (GTEx) ELSI research in biobanking Biospecimen Research Database (BRD) – online literature and SOPs database Biospecimen Evidence-Based Practices (BEBPs) Biobank economics research and online tools Patient brochures CDR collaboration

Comprehensive Data Resource (CDR) Supports two ongoing biospecimen programs: –The Genotype-Tissue Expression Program (GTEx) – a NIH Common Fund study of genomic variation and tissue-specific expression, analyzing up to 30 tissues per donor in 900 deceased donors. –The Biospecimen Preanalytical Variables program (BPV) – a study of preanalytical variation in tissue processing and storage (FFPE and frozen tissues) and the effects of such variation on downstream molecular analysis.

Program Needs Driving CDR Development NIH Common Fund Project: GTEx –Normal/Non-diseased –Postmortem –30+ tissue types per donor –960 donors

Science : A representation of how variation in the human genome affects gene expression among individuals and tissues. Colors and shapes show variations between people and within individuals. The Genotype-Tissue Expression (GTEx) Consortium examined postmortem tissue to document how genetic variants confer differences in gene expression across the human body. See pages 618, 640, 648, 660, and 666.

Program Needs Driving CDR Development Biospecimen science project: BPV –Cancer patients –Surgical tissues –Studies of predefined preanalytical factors –Multi-site experimental design and SOPs

Recorded/annotated -Anesthesia -Intra-operative ischemia -Many other variables -Post-operative ischemia -Room temperature -Type of preservative -Rate of freezing/fixing Tissue processing -Multiple formulaic variables - Multiple time settings for each H&E IHC FISH RNA isolation Storage

Biospecimen Collection, Processing, Storage Data Blood and tissue collection and processing data –Blood tube type, time stamps for processing –For resected tissues: surgical clamp times, time placed in fixative or frozen, time placed in tissue processor, etc. –Storage conditions Pathology QC

Data to be Collected and Managed Donor parameters –SOP –Consent verification –Maintaining privacy Biospecimen collection, processing, storage parameters –SOPs –Electronic CRFs with time stamps etc. Physical transfers of biospecimens –MTAs –Chain of custody information Clinical data about the donor Pathology data –Diagnosis (donor level data from path report) –Diagnosis and additional observations (individual biospecimen level data) IDs of donors, cases, and derived biospecimens

Patient Acquisition Handling/ Processing StorageDistribution Scientific Analysis Medical/ Surgical Procedures Knowledge Base Time 0 Post-acquisitionPre-acquisition Specimen is viable and biologically reactive Molecular composition subject to further alteration/degradation What Types of Information Do We Want to Capture?

Program requirements for CDR Functions Development of CDEs to thoroughly annotate the biospecimen life cycle to support the goals of the project Development of workflow-based annotation with live data entry at BSS when possible Provision of IDs for the project Record data at the BSS and transmit to project homepage (annotation, gross pathology images) Monitor shipping between different program sites GTEx Data flow in CDR

CDR Overview CDR is a distributed bioinformatics tissue collection and information management platform –Built upon open source technologies and frameworks –Supports the needs of complex projects that require collecting a large number of high quality, well-annotated human biospecimens. CDR is a custom software solution –Incorporates specific biospecimen procurement SOPs –Provides data security for HIPAA-compliant limited dataset –Provides real-time distributed data services between multiple centers nationwide.. CDR is designed to follow the NCI Best Practices for biospecimen collection and annotation.

CDR Technology The CDR is built on the Grails Framework –Leverages Groovy scripting for rapid development and easy learning curve –Leverages Spring for security –Leverages Hibernate for Object - Relational mapping, keeping the application database agnostic –Enables rapid and flexible web service API development, enabling the CDR to interconnect with multiple institutions and systems

Data Flow in CDR Specimens Data Comprehensive Data Resource (CDR) Laboratory, Data Analysis & Coordinating Center Pathology Resource Center Specimens Key Represents Kits and Biospecimens Represents Data Biospecimen Source Sites Participating Hospitals Biospecimen Source Sites Participating Hospitals Comprehensive Biospecimen Resource Kits Data Management and Quality Management

CDR – Can This be Useful to the Biobanking Community? Unmet needs for management software in biobanking community Facilitate use of Best Practices and annotation of biospecimen collection and processing steps CDR is being adopted for other NCI programs including the CPTAC program (Clinical Proteomic Tumor Analysis). The CDR code was posted last year. Collaborative Announcement:

CDR – Collaborative Project Voluntary collaboration: –No funding to individual collaborator(s) –Collaborator(s) have their own IT capacity to further develop and customize the software Informational sessions –Two Webinars in July: Program introduction and live demo –100+ participants each time Collaborators who are interested in the collaboration will provide: –Intended area of research to support after adopting CDR –IT experience and expertise in the proposed adoption –Relevance to biobanking operation –Contributions in standardizing and streamlining biobanking practices.

Standardized Terminology & Definition Different data types & sources in CDR –Data entered into CDR comes from a variety of sources and organizations: Different Biospecimen Source Sites for data entry Various data types: –Operational data –Clinical Data caHUB Enterprise Vocabulary Service –caHUB EVS Provide standardized terminology and definitions that serve as a consistent basis for data integration and data sharing across the program.

EVS Architecture Project requirements Data call from users External Sources Indexed full-text search

EVS Tools Protégé –Protégé is a free, open source ontology editor and knowledge-base framework developed at Stanford. –The caHUBt (caHUB Thesaurus) is documented and managed in NCI Protégé. SOLR –Solr is an open source enterprise search platform from the Apache Lucene project. –SOLR provides distributed search and index replication. –SOLR supports full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication. –caHUBt is fully indexed through SOLR

EVS Components GTEx, BMS and BPV form elements defined in Protégé as Common Data Elements (CDEs) –Form Elements and valid values definitions stored in Protégé –Exported to SOLR for display on CDR Valid Values defined in SOLR for display in CDR: –Cause of Death, Source ICD10-CM Synonyms from UMLS –Medications, source FDA NDC List –Primary Cancer Type, source PDQ Disease List –Medical Procedures, source AMA CPT List Synonyms from UMLS

Common Data Elements (CDEs) The CDEs is the repository of locally defined data elements used on the forms in the CDR. These definitions were provided by the study in which the data element is leveraged, or were developed by the Vocabulary team leveraging standard sources such as the NCIt and the UMLS Example: BMI, gender.

Causes of Death (COD) “Cause of Death” concepts –Immediate –First Underlying –Last Underlying –Death Certificate –CDR form element “Death Circumstances” for GTEx study Valid Value Set Sources: –the caHUBt in NCI Protégé, –the ICD10-CM, –the NLM Unified Medical Language System (UMLS) The ICD10-CM provides the foundation of the valid value list that is bound to each of the four “cause of death” concepts.

Medications (RX) “Current Medications” concept –CDR Form element for the GTEx study. Valid Value Set Source: –FDA NDC Directory, which is published by FDA on a weekly basis.

Primary Cancer Type (PCT) “Primary Cancer Type” concept: –CDR Form element for GTEx study. Valid Value Set Source: –the PDQ Disease List

Medical Procedures (CPT) “Underlying Conditions” concept: –CDR Form element for the GTEx Valid Value Set Sources: –UMLS –The AMA’s Current Procedural Terminology (CPT)

Sharing the caHUB Vocabulary with the Research Community Publish the data set on public-accessible sites –NIH CDE Portal –NCBO BioPortal –OBO Foundry