Introduction Barry Smith 1
BISC Team 2
National Center for Biomedical Ontology (NCBO) collaboration of: − Stanford Biomedical Informatics Research − The Mayo Clinic − University at Buffalo 3
Advisory Board Member Multiscale Systems Immunology for Adjuvant Development (Duke University / NIAID) Gene Ontology Consortium Cleveland Clinic Semantic Database in Cardiothoracic Surgery Advancing Clinico-Genomic Trials on Cancer (ACGT) Ontology for Clinical Research (OCRe) 4
Consultant Modelling Immunity for Defense Center (PRIME), Mount Sinai School of Medicine (NIAID) Institute of Health Policy Studies, University of California, San Francisco German Federal Ministry of Health DoD Joint Forces Command 5
The problem too much data, too many incompatible formats and standards 6
The solution(s) Post-coordination Pre-coordination 7
Post-coordination PIs, hospitals, biostatisticians, Rho … 8 Northrop Grumman Stanford (Max & Mindy)
Post-coordination PIs, hospitals, biostatisticians, Rho … 9 Northrop Grumman Stanford (Max & Mindy) Lots of free text, local standards, local terminologies
Post-coordination PIs, hospitals, biostatisticians, Rho … 10 Northrop Grumman Stanford (Max & Mindy) Lots of free text, local formats, local standards, local terminologies operating here uniform standards applied post hoc
Post-coordination = arms-length enhancement of data PIs, hospitals, biostatisticians, Rho … 11 Northrop Grumman Stanford (Max & Mindy) free text protocols, local formats, local standards, local terminologies uniform standards applied post hoc
Pre-coordination PIs, hospitals, biostatisticians, Rho … 12 Northrop Grumman Stanford (Max & Mindy)
Pre-coordination PIs, hospitals, biostatisticians, Rho … 13 Northrop Grumman Stanford (Max & Mindy) some uniform standards applied already here
your data is already being subjected to some pre-coordination For example where you need to meet FDA requirements (CDISC …) when your data is packaged for submission to ImmPort But this is uncoordinated uses standards of varying quality is inefficient (costs money) is out of your control 14
Goal of this meeting Explore incremental steps towards some pre- coordination – Standards – Libraries 15
HLA data (purple) Flow Cytometry data (yellow) PCR data (green) Study Protocol, Operational data, Clinical data (blue) ITN Data Specimen Management Data (green) 16
Transplant Visit 0 0 v 0 Day 0 What is in a visit name? (ITN) 17
Visit 0, v0, v 0, 0, Day 0, Transplant 18
What is in a visit name? Visit 0, v0, v 0, 0, Day 0, Transplant Protocol Group Assay Group Schedule of Events Specimen Table
What is in a visit name? Visit 0, v0, v 0, 0, Day 0, Transplant Protocol Group Assay Group CRO Schedule of Events Specimen Table CRF Day 0, Transplant 0 0 from Ravi Shankar, Stanford 20
What is in a visit name? Visit 0, v0, v 0, 0, Day 0, Transplant Protocol Group Assay Group CRO Operations Group Schedule of Events Specimen Table Tube Table CRF v 0v Day 0, Transplant 21
What is in a visit name? Visit 0, v0, v 0, 0, Day 0, Transplant Protocol Group Assay Group CRO Cimarron Operations Group Tube Manufacturer Schedule of Events Specimen Table Tube Table CRFImmunoTrak Kit Report Day 0, Transplant 0 0 v 0 v0, Visit 0 22
What is in a visit name? Visit 0, v0, v 0, 0, Day 0, Transplant Protocol Group Assay Group CRO Cimarron Operations Group Schedule of Events Specimen Table Tube Table CRFImmunoTrak Kit Report Core Labs Assays 0 0 Day 0, Transplant v0 v0, Visit 0 v 0 Tube Manufacturer v 0 23
What is in a visit name? Visit 0, v0, v 0, 0, Day 0, Transplant CRO Protocol Group Assay Group Cimarron Operations Group Data Center Schedule of Events Specimen Table Tube Table CRFImmunoTrak Kit Report Database Core Labs Assays Day 0, Transplant v0 0 0 v0, Visit 0 Tube Manufacturer v 0 24
field report from Stanford.. In the Casale Study the common mapping column is "Study Collection Day" which was not represented in the protocol table. In the protocol "Study Collection Day" was summarized to VISIT as it occurred within a window of +/- 5 days across centers. However, the VISIT column was missing from most of the datasets. 25
According to Fig 1 Ragweed season starts ~Week 5 and ends at ~ Week 12. According to Fig 3, days in Primary ragweed season is unclear. X – Axis is different in these two figs. As a result we are unsure when the ragweed season started during the study and how to map the Study Collection day with VISIT days and Week. When is “Ragweed Season” Issue: Mapping the allergy score with mechanistic /immunological assays. The raw allergy severity score for individual symptoms were used to determine the primary ragweed season and eventually to map the VISIT and WEEK. 26
As per protocol all data mapped by Visit and Week where as raw data ONLY provides collection study day mapping in most cases. 27
Allergy Score ( Study Collection Day) Lab Tests ( Study Time collected) Microarray Data ( Only Visit ) Flow ( Collection_Study_day and Visit) Mappings between protocol, lab tests and mechanistic assays were missing 28
Study Assessment data successfully aligned by Subject ID, Arm and Collection Study Day or Visit Severity Score Flow Cytometry Microarray Serum Analyte Wk -9 to -10 SCREENING Rush RIT Omalizumab/Placebo + Immunotherapy/Placebo Omalizumab pre-treatment Wk 0Wk 1Wk 5Wk 9Wk 13 Data Collection Day Visit No of Assessments Serum Analytes( e.g IgE, IgM) Microarray Flow Severity Score 29
Libraries ImmPort Templates: Race displaySubmitTemplates.do 30
Libraries some libraries already exist (SDTM, ImmPort templates) CTMS field values 31
ImmPort Templates How specify Race if Race = ‘Other’? 32
ImmPort Templates How specify “Subject Phenotype”? 33
Activities Library 34
ImmPort Antibody Registry (Diehl, et al) from BD Lyoplate Screening Panels Human Surface Markers 35
Discoverability 36
Stakeholders involved in making the needed changes NIAID, FDA Northrop Grumman Stanford Buffalo Rho Medidata Rave (CTMS vendors) Labkey (ELN vendors) 37
pipeline is siloed by heterogeneous terminologies and standards perform study & collect data analyze data (SAS …) submit data to ImmPort process & de-identify, data in ImmPort discover, aggregate, analyze, data in ImmPort 38
how break down terminology-created silo walls perform study & collect data analyze data (SAS …) submit data to ImmPort process & de-identify, data in ImmPort discover, aggregate, analyze, data in ImmPort standards and software CDISC, SDTM EHRs, ELNs, CTMSs standards and software ImmPort templates MYSQL … 39
these ontologies are already being used by Max and Mindy perform study & compile data analyze data (SAS …) submit data to ImmPort process & de-identify, data in ImmPort discover, aggregate, analyze, data in ImmPort standards and software CDISC, SDTM EHRs, ELNs, CTMSs standards and software ImmPort templates html, SQL.. immune- related ontologies GO, PRO, CL, … 40
pipeline perform study & compile data analyze data (SAS …) submit data to ImmPort process & de-identify, data in ImmPort discover, aggregate, analyze, data in ImmPort standards and software ImmPort templates html, SQL.. immune-related ontologies GO, PRO, CL, … CDISC2RTF standards and software CDISC, SDTM EHRs, ELNs, CTMSs 41
Goals of ImmPort Accelerate a more collaborative and coordinated research environment Create an integrated database that broadens the usefulness of scientific data Advance the pace and quality of scientific discovery Integrate relevant data sets from participating laboratories, public and government databases, and private data sources Promote rapid availability of important findings Provide analysis tools to advance immunological research 42
Improve immunology research through enhanced Collaboration Coordination Discoverability Integration Analyzability Hypothesis: all of these ends will be promoted by describing ImmPort data using terms from shared high quality ontologies 43