biomedical and healthCAre Data Discovery Index Ecosystem NIH Core Team Ron Margolis (Lead) Ian Fore (Science Officer) Dawei Lin & Alison Yao (Program Officers) Jennie Larkin (ADDS office liaison) RDA BOF on Data Search 3/1/16 NIH grant 1U24 AI to the University of California, San Diego
Aims – “Pubmed” for Data 1. Help users find shared data 2. Build a prototype data discovery index 3. Evaluate requirements for next phase FAIR: Findability, Accessibility, Interoperability, Reusability White Paper – finalized 3/8/2015 (can be found at biocaddie.org)
3 Organizing framework and portal for data dashed lines: mapping of metadata, standards, links to aggregators aggregators: various indices whose metadata are or can be mapped into Commons metadata Data Digital objects The Concepts of DDI From Lucila Ohno Machado
Open Community Participation Identifiers Metadata Identifiers Metadata Working GroupsPilot Projects/RFAs to the CommunitySupplements
Data Indexing Pipeline 1. Configuration file developed by curator 2. Extraction of metadata/data from data resource or dataset via ingestion module Cache information for further processing 3. Process metadata/data via sequential set of processing modules e.g. ID conversion, keyword extraction, data normalization 4. Mapping of metadata/data to metadata model(s) 5. Export to target endpoint(s)via export modules 6. Search via ElasticSearch APIs From Jeff Grethe
datamed.biocaddie.org (v0.5)
Acknowledgements Lucila Ohno Machado (PI) The bioCADDIE team NIH Staff The bioCADDIE ESP Working Group experts Collaborators
A DDI Example for PDB { "dataItem": { "ID": "4IAQ", "title": "Crystal structure of the chimeric protein of 5-HT1B-BRIL in complex with dihydroergotamine (PSI Community Target)", "description": "5-hydroxytryptamine receptor 1B", "keywords": [ "Signaling Protein”, "GPCR Dock" ], "dataTypes": [ "dataItem", "citation", "materialEntity", "organism", "identifiers" ] }, "materialEntity": [ { "name": "TYROSINE", "role": "chemical component", "formula": "C9 H11 N O3", "weight": " ", "type": "L-peptide linking" }, Identifier Metadata …….