Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Cancer Institute U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health CDR, an NCI-developed Vocabulary and Database for.

Similar presentations


Presentation on theme: "National Cancer Institute U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health CDR, an NCI-developed Vocabulary and Database for."— Presentation transcript:

1 National Cancer Institute U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health CDR, an NCI-developed Vocabulary and Database for Biobanking Helen Moore and Ping Guan Biorepositories and Biospecimen Research Branch CTS Ontology Workshop 2015 Sept. 23, 2015

2 Improving Biospecimen Processes is Essential to Enable Better Research, Clinical Trials, and Molecular Medicine Biospecimen Collection (Blood, Tissues, Urine, etc.) Processing in Pathology Lab Patient Care Clinical Trials Research StorageAnalysis Adapted from Peggy Devine Clinical Data Collection

3 Biorepositories and Biospecimen Research Branch (BBRB) Mission –BBRB provides leadership, tools, resources, and policies in biobanking for the global biomedical research community, to enable translational research and precision medicine for patients. http://biospecimens.cancer.gov/

4 Current Initiatives NCI Best Practices for Biospecimen Resources Biospecimen Preanalytical Variables Program (BPV) Genotype-Tissue Expression Program (GTEx) ELSI research in biobanking Biospecimen Research Database (BRD) – online literature and SOPs database Biospecimen Evidence-Based Practices (BEBPs) Biobank economics research and online tools Patient brochures CDR collaboration

5 Comprehensive Data Resource (CDR) Supports two ongoing biospecimen programs: –The Genotype-Tissue Expression Program (GTEx) – a NIH Common Fund study of genomic variation and tissue-specific expression, analyzing up to 30 tissues per donor in 900 deceased donors. –The Biospecimen Preanalytical Variables program (BPV) – a study of preanalytical variation in tissue processing and storage (FFPE and frozen tissues) and the effects of such variation on downstream molecular analysis.

6 Program Needs Driving CDR Development NIH Common Fund Project: GTEx –Normal/Non-diseased –Postmortem –30+ tissue types per donor –960 donors

7 Science : A representation of how variation in the human genome affects gene expression among individuals and tissues. Colors and shapes show variations between people and within individuals. The Genotype-Tissue Expression (GTEx) Consortium examined postmortem tissue to document how genetic variants confer differences in gene expression across the human body. See pages 618, 640, 648, 660, and 666.

8 Program Needs Driving CDR Development Biospecimen science project: BPV –Cancer patients –Surgical tissues –Studies of predefined preanalytical factors –Multi-site experimental design and SOPs

9 Recorded/annotated -Anesthesia -Intra-operative ischemia -Many other variables -Post-operative ischemia -Room temperature -Type of preservative -Rate of freezing/fixing Tissue processing -Multiple formulaic variables - Multiple time settings for each H&E IHC FISH RNA isolation Storage

10 Biospecimen Collection, Processing, Storage Data Blood and tissue collection and processing data –Blood tube type, time stamps for processing –For resected tissues: surgical clamp times, time placed in fixative or frozen, time placed in tissue processor, etc. –Storage conditions Pathology QC

11 Data to be Collected and Managed Donor parameters –SOP –Consent verification –Maintaining privacy Biospecimen collection, processing, storage parameters –SOPs –Electronic CRFs with time stamps etc. Physical transfers of biospecimens –MTAs –Chain of custody information Clinical data about the donor Pathology data –Diagnosis (donor level data from path report) –Diagnosis and additional observations (individual biospecimen level data) IDs of donors, cases, and derived biospecimens

12 Patient Acquisition Handling/ Processing StorageDistribution Scientific Analysis Medical/ Surgical Procedures Knowledge Base Time 0 Post-acquisitionPre-acquisition Specimen is viable and biologically reactive Molecular composition subject to further alteration/degradation What Types of Information Do We Want to Capture?

13 Program requirements for CDR Functions Development of CDEs to thoroughly annotate the biospecimen life cycle to support the goals of the project Development of workflow-based annotation with live data entry at BSS when possible Provision of IDs for the project Record data at the BSS and transmit to project homepage (annotation, gross pathology images) Monitor shipping between different program sites GTEx Data flow in CDR

14 CDR Overview CDR is a distributed bioinformatics tissue collection and information management platform –Built upon open source technologies and frameworks –Supports the needs of complex projects that require collecting a large number of high quality, well-annotated human biospecimens. CDR is a custom software solution –Incorporates specific biospecimen procurement SOPs –Provides data security for HIPAA-compliant limited dataset –Provides real-time distributed data services between multiple centers nationwide.. CDR is designed to follow the NCI Best Practices for biospecimen collection and annotation.

15 CDR Technology The CDR is built on the Grails Framework –Leverages Groovy scripting for rapid development and easy learning curve –Leverages Spring for security –Leverages Hibernate for Object - Relational mapping, keeping the application database agnostic –Enables rapid and flexible web service API development, enabling the CDR to interconnect with multiple institutions and systems

16 Data Flow in CDR Specimens Data Comprehensive Data Resource (CDR) Laboratory, Data Analysis & Coordinating Center Pathology Resource Center Specimens Key Represents Kits and Biospecimens Represents Data Biospecimen Source Sites Participating Hospitals Biospecimen Source Sites Participating Hospitals Comprehensive Biospecimen Resource Kits Data Management and Quality Management

17 CDR – Can This be Useful to the Biobanking Community? Unmet needs for management software in biobanking community Facilitate use of Best Practices and annotation of biospecimen collection and processing steps CDR is being adopted for other NCI programs including the CPTAC program (Clinical Proteomic Tumor Analysis). The CDR code was posted last year. Collaborative Announcement: https://ttc.nci.nih.gov/opportunities/opportunity.php?opp_id=748093 754466223

18 CDR – Collaborative Project Voluntary collaboration: –No funding to individual collaborator(s) –Collaborator(s) have their own IT capacity to further develop and customize the software Informational sessions –Two Webinars in July: Program introduction and live demo –100+ participants each time Collaborators who are interested in the collaboration will provide: –Intended area of research to support after adopting CDR –IT experience and expertise in the proposed adoption –Relevance to biobanking operation –Contributions in standardizing and streamlining biobanking practices.

19 Standardized Terminology & Definition Different data types & sources in CDR –Data entered into CDR comes from a variety of sources and organizations: Different Biospecimen Source Sites for data entry Various data types: –Operational data –Clinical Data caHUB Enterprise Vocabulary Service –caHUB EVS Provide standardized terminology and definitions that serve as a consistent basis for data integration and data sharing across the program.

20 EVS Architecture Project requirements Data call from users External Sources Indexed full-text search

21 EVS Tools Protégé –Protégé is a free, open source ontology editor and knowledge-base framework developed at Stanford. –The caHUBt (caHUB Thesaurus) is documented and managed in NCI Protégé. SOLR –Solr is an open source enterprise search platform from the Apache Lucene project. –SOLR provides distributed search and index replication. –SOLR supports full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication. –caHUBt is fully indexed through SOLR

22 EVS Components GTEx, BMS and BPV form elements defined in Protégé as Common Data Elements (CDEs) –Form Elements and valid values definitions stored in Protégé –Exported to SOLR for display on CDR Valid Values defined in SOLR for display in CDR: –Cause of Death, Source ICD10-CM Synonyms from UMLS –Medications, source FDA NDC List –Primary Cancer Type, source PDQ Disease List –Medical Procedures, source AMA CPT List Synonyms from UMLS

23 Common Data Elements (CDEs) The CDEs is the repository of locally defined data elements used on the forms in the CDR. These definitions were provided by the study in which the data element is leveraged, or were developed by the Vocabulary team leveraging standard sources such as the NCIt and the UMLS Example: BMI, gender.

24 Causes of Death (COD) “Cause of Death” concepts –Immediate –First Underlying –Last Underlying –Death Certificate –CDR form element “Death Circumstances” for GTEx study Valid Value Set Sources: –the caHUBt in NCI Protégé, –the ICD10-CM, –the NLM Unified Medical Language System (UMLS) The ICD10-CM provides the foundation of the valid value list that is bound to each of the four “cause of death” concepts.

25 Medications (RX) “Current Medications” concept –CDR Form element for the GTEx study. Valid Value Set Source: –FDA NDC Directory, which is published by FDA on a weekly basis.

26 Primary Cancer Type (PCT) “Primary Cancer Type” concept: –CDR Form element for GTEx study. Valid Value Set Source: –the PDQ Disease List

27 Medical Procedures (CPT) “Underlying Conditions” concept: –CDR Form element for the GTEx Valid Value Set Sources: –UMLS –The AMA’s Current Procedural Terminology (CPT)

28 Sharing the caHUB Vocabulary with the Research Community Publish the data set on public-accessible sites –NIH CDE Portal –NCBO BioPortal –OBO Foundry


Download ppt "National Cancer Institute U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health CDR, an NCI-developed Vocabulary and Database for."

Similar presentations


Ads by Google