Reproducibility: A Funder and Data Science Perspective

Slides:



Advertisements
Similar presentations
Belinda Seto, Ph.D. Deputy Director National Institute of Biomedical Imaging and Bioengineering Belinda Seto, Ph.D. Deputy Director National Institute.
Advertisements

Data, Data Everywhere, But Not a Byte to Eat Michael F. Huerta, Ph.D. Associate Director, National Library of Medicine Director, Office of Health Information.
Data the NIH What is Happening & What is Coming A Conversation Philip E. Bourne, PhD, FACMI Associate Director for Data Science National Institutes.
George A. Komatsoulis, Ph.D. National Center for Biotechnology Information National Library of Medicine National Institutes of Health U.S. Department of.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)
BD2K-LINCS-Perturbation Data Coordination & Integration Center Applicant Information Webinar for RFA-HG Ajay Pillai and Jennie Larkin January 13,
Data! Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health.
Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.
Data Science at the NIH Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health.
NIH Big Data to Knowledge (BD2K) March 4, 2014 Peter Lyster National Institute of General Medical Sciences (NIGMS) NIH.
National Centers for Biomedical Computing Software and Data Integration Working Group Peter Lyster (Chair) NCBC Workshop Wednesday December 16 (2005)
NIH Council of Councils Meeting November 21, 2008 LINCS Library of Integrated Network-based Cellular Signatures.
NIH Common Fund Library of Integrated Network- based Cellular Signatures LINCS Applicant Information Webinar for RFA-RM September 6, :00 –
Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.
1 Judy Hewitt, PhD On Detail to Office of Extramural Research National Institutes of Health May 18, 2015 Center for Scientific Review Advisory Council.
MPS Workshop 1: Gauging the Impact of Requirements for Public Access to Data November 19, 2015 Jennie Larkin, Ph.D. Office of the Associate Director for.
NIH: DATA SCIENCE & BD2K Jennie Larkin, PhD Senior Advisor, Extramural Programs and Strategic Planning Office of the Associate Director for Data Science,
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
Biomedical and healthCAre Data Discovery Index Ecosystem NIH Core Team Ron Margolis (Lead) Ian Fore (Science Officer) Dawei Lin & Alison Yao (Program Officers)
A CLOSER LOOK AT RECENT NIH APPLICATION CHANGES…. Revised May 5, 2016.
Rigor and Transparency in Research
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
The NIH Data Commons: A Cloud-based Training Environment Philip E. Bourne, Ph.D. FACMI Associate Director for Data Science National Institutes of Health.
Maintaining Scientific Rigor and Enhancing Discovery Philip E. Bourne, PhD, FACMI Associate Director for Data Science The National Institutes of Health.
SciENcv: a Federal biosketch tool NIH Regional Meeting October 2016 Neil Thakur, PhD Office of Extramural Research Bart Trawick, PhD National Center for.
NIH Update Maria Skinner, OSP Manager (NIH Lead) Laura Johnston, OSP Asst. Director January 7, /7/2016.
Enhancements to Galaxy for delivering on NIH Commons
A Funder's Perspective on Sustainability of Digital Data Repositories
NIH – A Vision Through 2020 Philip E. Bourne, PhD, FACMI Associate Director for Data Science
Towards a unified MOD resource: An Overview
To develop the scientific evidence base that will lessen the burden of cancer in the United States and around the world. NCI Mission Key message:
Jennie Larkin, PhD Senior Advisor
Baltic Sea Region Urban Forum for Smart Cities
Commons Credit Model: Update to the BD2K AHM
MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill,
Data Analytics and CERN IT Hadoop Service
Scientific Reproducibility using the Provenance for Healthcare and Clinical Research Framework Satya S. Sahoo Collaborators/Co-Authors: Joshua Valdez,
Tools and Services Workshop
EOSC MODEL Pasquale Pagano CNR - ISTI
Joslynn Lee – Data Science Educator
NLM: Meeting Challenges & Seizing Opportunities in & with Big Data
What is the National Data Service?
The NIH perspective on rigor and reproducibility
SHARE: A Public Good to Increase Scholarly Innovation
Using metrics to change the narrative
Summit 2017 Breakout Group 2: Data Management (DM)
The Challenge.
Next Generation Preprint Service
NIH GRANT PREPARATION WORKSHOP: A workshop for new investigators about putting together administrative portions of a grant and the NIH review panel. Tuesday,
Chapter 18 MobileApp Design
EGI-Engage Engaging the EGI Community towards an Open Science Commons
Reporting Approaches and Best Practices Jennifer Benjamin NCQA
Speaker’s Name, SAP Month 00, 2017
The Data Commons An introduction & Overview
Being an effective consumer of preclinical research
Director of Training, Workforce Development and Diversity
Update from the National Institutes of Health (NIH)
EOSCpilot Skills Landscape & Framework
EOSCpilot All Hands Meeting 8 March 2018 Pisa
VUMC Core Managers Meeting
An EUDAT-based FAIR Data Approach for Data Interoperability
Mark van de Sanden SURFsara EUDAT CDI Technical Coordinator.
Bird of Feather Session
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
Data(trans)forming Roberto Barcellan European Commission NTTS2019
Investing in Data Management Capabilities
OpenDP: A Pitch for a Community Effort
Supporting Open Research
FAIR Across – Implementation of FAIR into research practice
Presentation transcript:

Reproducibility: A Funder and Data Science Perspective Philip E. Bourne, PhD, FACMI University of Virginia Thanks to Valerie Florence, NIH for some slides http://www.slideshare.net/pebourne peb6a@virginia.edu NetSci Preworkshop 2017 June 19, 2017 6/19/17

Who Am I Representing And What Is My Bias? I am presenting my views, not necessarily those of NIH Now leading an institutional data science initiative Total data parasite Unnatural interest in scholarly communication Co-founded and founding EIC PLOS Computational Biology – OA advocate Prior co-Director Protein Data Bank Amateur student researcher in scholarly communication 6/19/17

Reproducibility is the responsibility of all stakeholders…. 6/19/17

6/19/17

Lets start with researchers … 6/19/17

Reproducibility - Examples From My Own Work … And recently … Phew… http://www.sdsc.edu/pb/kinases/ It took several months to replicate this work this work 6/19/17

Beyond value to myself (and even then the emphasis is not enough) there is too little incentive to make my work reproducible by others … 6/19/17

Tools Fix This Problem Right? Extracted all PMC papers with associated Jupyter notebooks available Approx. 100 Took a random sample of 25 Only 1 ran out of the box Several ran with minor modification Others lacked libraries, sufficient details to run etc. It takes more than tools.. It takes incentives … Daniel Mietchen 2017 Personal Communication 6/19/17

Funders and publishers are the major levers. What are funders doing Funders and publishers are the major levers .. What are funders doing? Consider the NIH ….. 6/19/17

6/19/17

NIH Special Focus Area https://www.nih.gov/research-training/rigor-reproducibility 11 6/19/17

Outcomes – General … 6/19/17

Enhancing Reproducibility through Rigor and Transparency NOT-OD-15-103 Clarifies NIH expectations in 4 areas Scientific premise Describe strengths and weaknesses of prior research Rigorous experimental design How to achieve robust and unbiased outcomes Consideration of sex and other relevant biological variables Authentication of key biological and/or chemical resources e.g., cell lines 6/19/17

Outcomes – network based … 6/19/17

Experiment in Moving from Pipes to Platforms 6/19/17 Sangeet Paul Choudary https://www.slideshare.net/sanguit

Commons & the FAIR Principles The Commons is a virtual platform physically located predominantly on public clouds Digital assets (objects) within that system are data, software, narrative, course materials etc. Assets are FAIR – Findable, Accessible, Interoperable and Reusable Bonazzi and Bourne 2017 FAIR: https://www.nature.com/articles/sdata201618 PLoS Biol 15(4): e2001818 6/19/17 https://www.workitdaily.com/job-search-solution/

Just announced … https://commonfund.nih.gov/sites/default/files/RM-17-026_CommonsPilotPhase.pdf Bonazzi and Bourne 2017 FAIR: https://www.nature.com/articles/sdata201618 6/19/17

Current Data Commons Pilots Commons Platform Pilots Explore feasibility of the Commons Platform Facilitate collaboration and interoperability Cloud Credit Model Provide access to cloud via credits to populate the Commons Connecting credits to NIH Grants Reference Data Sets Making large and/or high value NIH funded data sets and tools accessible in the cloud Developing Data & Software Indexing methods Leveraging BD2K efforts bioCADDIE et al Collaborating with external groups Resource Search & Index 6/19/17

Commons - Platform Stack Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data Digital Object Compliance App store/User Interface https://datascience.nih.gov/commons 6/19/17

Mapping BD2K Activities to the Commons Platform BD2K Centers, MODS, HMP & Interoperability Supplements Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data Digital Object Compliance App store/User Interface NCI & NIAID Cloud Pilots BioCADDIE/Other Indexing NIH + Community defined data sets possible FOAs and CCM Cloud credits model (CCM) https://datascience.nih.gov/commons 6/19/17

Overarching Questions Is the Commons a step towards improved reproducibility? Is the Commons approach at odds with other approaches, if not how best to coordinate? Do the pilots enable a full evaluation for a larger scale implementation? How best to evaluate the success of the pilots? 6/19/17

Other Questions Is a mix of cloud vendors appropriate? How to balance the overall metrics of success? Reproducibility Cost saving Efficiency – centralized data vs distributed New science User satisfaction Data integration and reuse – how to measure? Data security What are the weaknesses? 6/19/17

Thank You 6/19/17

Acknowledgements Vivien Bonazzi, Jennie Larkin, Michelle Dunn, Mark Guyer, Allen Dearry, Sonynka Ngosso Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS) NLM/NCBI: Patricia Brennan, Mike Huerta, George Komatsoulis NHGRI: Eric Green, Valentina di Francesco NIGMS: Jon Lorsch, Susan Gregurick, Peter Lyster CIT: Andrea Norris NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr NCI Cloud Pilots/ GDC: Warren Kibbe, Tony Kerlavage, Tanja Davidsen Commons Reference Data Set Working Group: Weiniu Gan (HL), Ajay Pillai (HG), Elaine Ayres, (BITRIS), Sean Davis (NCI), Vinay Pai (NIBIB), Maria Giovanni (AI), Leslie Derr (CF), Claire Schulkey (AI) RIWG Core Team: Ron Margolis (DK), Ian Fore, (NCI), Alison Yao (AI), Claire Schulkey (AI), Eric Choi (AI) OSP: Dina Paltoo 6/19/17