Update of the TCIA Imaging Data Pilot
TCIA Background
The Cancer Imaging Archive – Established 2010 Open access de-identified imaging data Covers most modalities (CT/MR/PET/RT) Wide variety of cancers + phantoms Patient populations vary from a few to >26,000 (NLST) Many have associated meta-data Demographics/outcomes/therapy Pathology histology imaging Radiologist expert and AI analyses ‘Omics via TCGA, CPTAC, and GEO Robust, multistep de-identifications for DICOM data and IRB approvals TCGA: The Cancer Genome Atlas GDC: Genomics Data Commons CPTAC: Clinical Proteomic Tumor Analysis Consortium GEO: Gene Expression Omnibus http://www.cancerimagingarchive.net
Fully Public (no registration): 55 collections Access Fully Public (no registration): 55 collections TCGA: 21 Phantom: 6 With associated metadata: 42 Limited access: 12 collections Protect data while investigators are publishing results Data collection for private or internal projects Mechanism to request access Limited data sets: QIN data sets: 6/10 limited NSLT
A growing user community 33,000 total subjects in the archive 67 data sets currently available 21 from The Cancer Genome Atlas project 10 from the Quantitative Imaging Network Clinical trial data from ECOG-ACRIN and RTOG 6 new data sets added/updated in 2016 363 publications based on TCIA data 70 new publications in 2016 As the network became richer and more known, with the connection to TCGA data, the publications have increased exponentially. NOTE: this is all we know about. Users are not obliged to tell us of publications and only professional ethics require that they acknowledge us. 81 of the 363 are NLST-related. The detailed listing of those publications is stored here: https://biometry.nci.nih.gov/cdas/nlst/https://biometry.nci.nih.gov/cdas/nlst/ (TCIA links out to this page on its publication page) These appear to all be peer-reviewed articles. Of the 282 listed non-NLST publications, 62 are conference proceedings and the remainder are peer-reviewed journal articles.
A growing user community TCIA now has over 4,000 active users per month
Apollo Pilot
Collaboration with Boston VA POP Initial contacts in June & July before we knew about Apollo Initiated discussions with Louis Fiore – initial plan: Centralize imaging data at VA-MAVERIC –transfer images from medical centers on internal VA system. Install TCIA client software if permitted (yes) De-id and apply unique identifiers at VA-MAVERIC with TCIA assistance Transfer Imaging data to TCIA, curate and post publicly Curate and transfer clinical research CRF data to TCIA Imaging data processed / analyzed by volunteer research groups Analysis results made available on TCIA and added back to VA rePOP database POP= Precision Oncology Program MAVERIC Massachusetts Veterans Epidemiology Research and Information Center
After APOLLO meeting Aug 29 Proposed the small POP data set to be a pilot for data processes needed for APOLLO Discussed with Fiore and his team on Aug 30 Identified the cases and key team members 31 cases fully consented (16 confirmed, 15 additional in pipeline) IRB authorized Medical images and histology images to go to TCIA Genomic and clinical data to GDC (TCIA will support for Pilot)
TCIA Input System (Access restricted) Imaging Workflow 3. Curation at MAVERIC (with TCIA assistance) 1.Imaging Data retrieved from PACS at VA Medical Centers 5. Secondary Review by TCIA 6. Data made publicly available to researchers 7.Quantitative Analysis (imaging linked by Master Patient ID to clinical data and genomics) 2. Moved to MAVERIC file system 4.Submission to TCIA Site A Apply Master patient ID Apply Date offset Remove known PHI Visual Review Private tag review TCIA Input System (Access restricted) Volume Analysis Radiomics Radiogenomics TCIA Production Site B MAVERIC Site C
Team members (Imaging Only) TCIA: Freymann, Kirby, Jaffe, TCIA staff at UAMC VA MAVERIC: Louis Fiore, Executive Director Dr. Valmeek (Vik) Kudesia – Director of Informatics Luis E. Selva, Ph.D. – Health Informatics Systems Samuel Ajjarapu, Saiju Parajan - Informatics Dr. Andrew Zimolazk – MD/Informatician Danne Elbers – Informatics Technical Lead
Cases will be processed as assembled, to shake out the bugs Status 9/8/2016 CTP local installation being set up at the VA MAVERIC with an imaging workflow in place there De-ID configuration draft: date offset and MasterCase ID to be applied across data types. Cases will be processed as assembled, to shake out the bugs Anticipate first imaging case ready to send at VA MAVERIC in 2 weeks. CTP+ RSNA MIRC Clinical Trials Processor
Sociology of data sharing? Why no barriers at all should be erected to sharing
Lowering barriers to data access: easier to search Sept. 2014 removed the login requirement for accessing public data Note that the previous login requirement was essentially anonomous Still required for non- public data Number of TCIA search queries (bi-weekly) Chart shows number of searches performed bi-weekly
Lowering barriers to data access: made it easier to download February 2016 simplified user interface added download buttons with each collection description. Gigabytes Downloaded (bi-weekly) For each collection that had a description – Chart shows Gbytes downloaded bi-weekly