First teleconference/web session Dec 11, 2015 Working Group 6 Criteria for Repository Inclusion: Standards, Interoperability, Sustainability, etc. First teleconference/web session Dec 11, 2015
Agenda 1. Introductions 2. Brief review of bioCADDIE 3. Goals for this group Timeline Discuss and check for understanding 4. Repositories/sources already indexed 5. The list of potential repositories/sources 6. Reactions, comments
WG6 – Current Membership George Alter - ICPSR, University of Michigan Dianne Babski - National Library of Medicine Tanya Barrett - NCBI (GEO, BioSample, BioProject), GA4GH Kei Cheung - Yale University Tim Clark - Harvard Medical School, FORCE11 Data Citation Implementation Group Larry Clarke - National Cancer Institute (NCI) Ian Fore - NIH Marek Grabowski - University of Virginia Jeffrey Grethe - University of California San Diego Chelsea Ju - University of California Los Angeles Chirag Lakhani - Harvard Medical School Matthew McAuliffe - Center for Information Technology NIH Neil McKenna - Baylor College of Medicine Lucila Ohno-Machado - University of California San Diego Thomas Radman - NIH Jim Rehg - Georgia Institute of Technology Susanna-Assunta Sansone - University of Oxford and Nature Publishing Group Alisa Surkis - New York University School of Medicine Griffin Weber - Harvard Medical School Justin Wood - University of California Los Angeles John Yates - The Scripps Research Institute Wenchao Yu - University of California Los Angeles 2/17/20192/17/2019 Supported by the NIH grant #xxxxxxxxx to the University of California, San Diego
Introductory Logistics bioCADDIE Web Site https://biocaddie.org White paper Under Resources Working Groups Menu Working Group 6 Or Google “biocaddie wg6” !
bioCADDIE – Working Groups location on the web site
User Interface Prototype UI webpage address: datamed.biocaddie.org User name: biocaddie Password: biocaddie
WG6 - Goals GOAL ACTIVITIES/RESPONSIBILITIES DELIVERABLES Obtain consensus from multiple NIH representatives on which data sets NIH wants to see indexed by bioCADDIE Determine which features(criteria) these sets have in common for future selection of data sets Determine process of review of criteria for newly proposed datasets ACTIVITIES/RESPONSIBILITIES Assemble an authoritative group of a minimum of 4 NIH officers and 4 bioCADDIE executive committee members to discuss criteria to select repositories using the DDI prototype. Decide which repositories will be used for the prototype DELIVERABLES Recommended metadata for data inclusion Contact information for NIH-selected data sets Standard for persistence and preservation Investigation of access requirements
Balancing Act for Criteria What researchers/repositories can provide? Which criteria program officers will endorse? Metadata quality
Criteria for Inclusion in the Initial Prototype Key data resources used by the community Aligns with concept of the Commons Pilots Cancer Genomics Cloud Pilots and Genomic Data Commons Human Microbiome Project Model organism databases Facilitate development of indexing methodology Ensure broad coverage of types of data Examples not yet represented Clinical data, Imaging data “The long tail”? The hardest to find The Variety component of big data There is good convergence between the bioCADDIE emphasis on indexing highly accessed datasets and the idea of the Commons. The Cancer Genomics Cloud Pilots and Genomic data commons both seek to make available data which are highly used by cancer researchers. These will not be in credits model funded cloud – but they are part of a broader Commons.
NIH representation on WG6 Represent NIH repositories Intramural and extramural Facilitate collaborations Represent NIH program staff Criteria they are willing to endorse in their programs
Background relevant to WG6 Metadata specification Can repositories provide this? Identifiers A non-prescriptive approach Does it work for repositories? Core Development work Pipeline for reading data from repositories Material on all the above on website How do we supplement this?
Repositories already in progress Source Status PDB, GEO Stable BioProject, ArrayExpress, dbGAP, GEMMA Ongoing Library of Integrated Network-based Cellular Signatures (LINCS) program Reviewing API details for structure Inter-university Consortium forPolitical and Social Research (ICPSR) Reviewing sample file for structure Jeff to speak to
Future The next 10 repositories The overall list of repositories In Google Docs The overall list of repositories
Thank you! Questions?