Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reproducibility: A Funder and Data Science Perspective

Similar presentations


Presentation on theme: "Reproducibility: A Funder and Data Science Perspective"— Presentation transcript:

1 Reproducibility: A Funder and Data Science Perspective
Philip E. Bourne, PhD, FACMI University of Virginia Thanks to Valerie Florence, NIH for some slides NetSci Preworkshop 2017 June 19, 2017 6/19/17

2 Who Am I Representing And What Is My Bias?
I am presenting my views, not necessarily those of NIH Now leading an institutional data science initiative Total data parasite Unnatural interest in scholarly communication Co-founded and founding EIC PLOS Computational Biology – OA advocate Prior co-Director Protein Data Bank Amateur student researcher in scholarly communication 6/19/17

3 Reproducibility is the responsibility of all stakeholders….
6/19/17

4 6/19/17

5 Lets start with researchers …
6/19/17

6 Reproducibility - Examples From My Own Work
… And recently … Phew… It took several months to replicate this work this work 6/19/17

7 Beyond value to myself (and even then the emphasis is not enough) there is too little incentive to make my work reproducible by others … 6/19/17

8 Tools Fix This Problem Right?
Extracted all PMC papers with associated Jupyter notebooks available Approx. 100 Took a random sample of 25 Only 1 ran out of the box Several ran with minor modification Others lacked libraries, sufficient details to run etc. It takes more than tools.. It takes incentives … Daniel Mietchen 2017 Personal Communication 6/19/17

9 Funders and publishers are the major levers. What are funders doing
Funders and publishers are the major levers .. What are funders doing? Consider the NIH ….. 6/19/17

10 6/19/17

11 NIH Special Focus Area 11 6/19/17

12 Outcomes – General … 6/19/17

13 Enhancing Reproducibility through Rigor and Transparency NOT-OD-15-103
Clarifies NIH expectations in 4 areas Scientific premise Describe strengths and weaknesses of prior research Rigorous experimental design How to achieve robust and unbiased outcomes Consideration of sex and other relevant biological variables Authentication of key biological and/or chemical resources e.g., cell lines 6/19/17

14 Outcomes – network based …
6/19/17

15 Experiment in Moving from Pipes to Platforms
6/19/17 Sangeet Paul Choudary

16 Commons & the FAIR Principles
The Commons is a virtual platform physically located predominantly on public clouds Digital assets (objects) within that system are data, software, narrative, course materials etc. Assets are FAIR – Findable, Accessible, Interoperable and Reusable Bonazzi and Bourne 2017 FAIR: PLoS Biol 15(4): e 6/19/17

17 Just announced … Bonazzi and Bourne 2017 FAIR: 6/19/17

18 Current Data Commons Pilots
Commons Platform Pilots Explore feasibility of the Commons Platform Facilitate collaboration and interoperability Cloud Credit Model Provide access to cloud via credits to populate the Commons Connecting credits to NIH Grants Reference Data Sets Making large and/or high value NIH funded data sets and tools accessible in the cloud Developing Data & Software Indexing methods Leveraging BD2K efforts bioCADDIE et al Collaborating with external groups Resource Search & Index 6/19/17

19 Commons - Platform Stack
Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data Digital Object Compliance App store/User Interface 6/19/17

20 Mapping BD2K Activities to the Commons Platform
BD2K Centers, MODS, HMP & Interoperability Supplements Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data Digital Object Compliance App store/User Interface NCI & NIAID Cloud Pilots BioCADDIE/Other Indexing NIH + Community defined data sets possible FOAs and CCM Cloud credits model (CCM) 6/19/17

21 Overarching Questions
Is the Commons a step towards improved reproducibility? Is the Commons approach at odds with other approaches, if not how best to coordinate? Do the pilots enable a full evaluation for a larger scale implementation? How best to evaluate the success of the pilots? 6/19/17

22 Other Questions Is a mix of cloud vendors appropriate?
How to balance the overall metrics of success? Reproducibility Cost saving Efficiency – centralized data vs distributed New science User satisfaction Data integration and reuse – how to measure? Data security What are the weaknesses? 6/19/17

23 Thank You 6/19/17

24 Acknowledgements Vivien Bonazzi, Jennie Larkin, Michelle Dunn, Mark Guyer, Allen Dearry, Sonynka Ngosso Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS) NLM/NCBI: Patricia Brennan, Mike Huerta, George Komatsoulis NHGRI: Eric Green, Valentina di Francesco NIGMS: Jon Lorsch, Susan Gregurick, Peter Lyster CIT: Andrea Norris NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr NCI Cloud Pilots/ GDC: Warren Kibbe, Tony Kerlavage, Tanja Davidsen Commons Reference Data Set Working Group: Weiniu Gan (HL), Ajay Pillai (HG), Elaine Ayres, (BITRIS), Sean Davis (NCI), Vinay Pai (NIBIB), Maria Giovanni (AI), Leslie Derr (CF), Claire Schulkey (AI) RIWG Core Team: Ron Margolis (DK), Ian Fore, (NCI), Alison Yao (AI), Claire Schulkey (AI), Eric Choi (AI) OSP: Dina Paltoo 6/19/17


Download ppt "Reproducibility: A Funder and Data Science Perspective"

Similar presentations


Ads by Google