Jarek Nabrzyski Director, Center for Research Computing

Slides:



Advertisements
Similar presentations
Panel 2 – Promoting Re-Use of Scientific Collections John Harrison SHAMAN Project University of Liverpool
Advertisements

Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
EInfrastructures (Internet and Grids) US Resource Centers Perspective: implementation and execution challenges Alan Blatecky Executive Director SDSC.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Future Access to the Scientific and Cultural Heritage – A shared Responsibility Birte Christensen-Dalsgaard State and University Library.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
Opportunities for increasing conservation effectiveness and research collaborations through a developing Conservation Remote Sensing Working Group Robert.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
Presenter: Karla Strieb Assistant Executive Director Transforming Research Libraries June 3, 2010 Supporting E-science: Progress at Research Institutions.
Dr. Jūratė Kuprienė Director for innovations and infrastructure development Workshop: Information services for research process , Rīga Research.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
DIY Research Data Management Training Kit for Librarians Data sharing Anne Donnelly Liaison Librarian College of Medicine & Veterinary Medicine College.
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
DASISH Final Conference Common Solutions to Common Problems.
© Internet 2012 Internet2 and Global Collaboration APAN 33 Chiang Mai 14 February 2012 Stephen Wolff Internet2.
National Science Foundation Revolutionizing science and engineering research though cyberinfrastructure by David G. Messerschmitt Member, NSF Blue Ribbon.
The Role of Academic Libraries in the Digital Data Universe Break-Out Session: New Partnership Models Bob Hanisch and Brian Schottlaender Co-Leaders ARL.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Research libraries in a European e-science infrastructure Wouter Schallier Executive Director LIBER (Association of European Research Libraries)
Research Information Management: Continuity, Change and Impact Michael Jubb Research Information Network UUK Workshop 5 December 2007.
The Importance of Standards in Digital Preservation Tina Norris Kayla Payne Jennifer
Mike Hildreth DASPOS Update Mike Hildreth representing the DASPOS project 1.
Overview of the NEH’s Digital Humanities Initiative
Ayoub Kafyulilo DUCE Challenges and Opportunities of Integrating ICT in Education.
SciencePAD Open Software for Open Science Alberto Di Meglio – CERN.
NOD Developments Policy aspects Technological aspects Product innovation Organizational aspects.
Announcing the 2014 National Digital Stewardship Agenda.
Planning for Restoration at the Landscape Scale: Desert LCC Case Study National Forest Foundation Collaborative Restoration Workshop April 26-27, 2016.
School on Grid & Cloud Computing International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics.
Forging New, Non-Traditional Partnerships among Physicists, Teachers and Students Marjorie Bardeen, Fermilab quarknet.i2u2.org.
Mike Hildreth representing the DASPOS Team
Mike Hildreth representing the DASPOS project
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
EOSCpilot WP4: Use Case 5 Material for
Mike Hildreth representing the DASPOS Team
Making “Open Data” Work: Challenges for Data Integration in Genomics Research
Carlos Morais Pires European Commission Information Society and Media
Short to Medium Term Priority issues for EGI, EMI, anD others
Scotland’s Environment Web Environmental Data Portal Joanna Muse Scottish Environment Protection Agency.
Publishing software and data
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
Data Management & Analysis in MATTER
W. Christopher Lenhardt
Standards for success in city IT and construction projects
Pre-Workshop Prep Download example files:
DATA SPHINX & EUDAT Collaboration
The Q Improvement Lab August 2017.
EOSCpilot Skills Landscape & Framework
Training Course on Data Management for Information Professionals and In-Depth Digitization Practicum September 2011, Oostende, Belgium Concepts.
Scientific Data as Research Infrastructure
Reproducible Science Gordon Watts (University of Center for modeling complex interactions G. Watts (UW/Seattle)
Opening Access: Increasing Scholarly Impact with
ESciDoc Introduction M. Dreyer.
ESciDoc Introduction M. Dreyer.
What does DPHEP do? DPHEP has become a Collaboration with signatures from the main HEP laboratories and some funding agencies worldwide. It has established.
Common Solutions to Common Problems
Briefing to ARL Membership
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Metadata Development in the Earth System Curator
Bird of Feather Session
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
Wrap-Up – NSF Site Visit 8 February 2010
Donatella Castelli (CNR-ISTI) Project coordinator
Co-Chairs: Mike Hildreth (Notre Dame), Ruth Duerr (Ronin Inst.) + ?
EOSC-hub Contribution to the EOSC WGs
Facilitating open science at Florida State University Libraries
Presentation transcript:

Workshop Container Strategies for Data and Software Preservation that Promote Open Science Jarek Nabrzyski Director, Center for Research Computing University of Notre Dame naber@nd.edu

Q: Preservation for what? A: For reproducibility/reuse/replicability/r… in computational science

Science and digital age Science is the mother of the digital age However, since the moment CERN has created the open internet, science has struggled to go digital and to go open. What is open science and why is it important?

What is open science? The term refers to efforts by researchers, governments, research funding agencies and the scientific community itself to make the primary outputs of publicly funded research results – publications and the research data (and software if possible) – publicly accessible in digital format with no or minimal restriction as a means for accelerating research. These efforts are in the interest of enhancing transparency and collaboration, and fostering innovation.

Scientific Ideals Innovative ideas Reproducibility (the cornerstone of the scientific method) Accumulation of knowledge We want to believe that science is accumulating knowledge about interesting, real phenomena But is that really the case? How much can we trust the knowledge that has been accumulated based on published findings?

Unfortunately, it has become apparent over the last few years that perhaps the answer to that question is not all that much. Now, there have been some very prominent cases in the past few years of outright fraud, where people have completely fabricated their data, but I’m not talking about those case. What I’m talking is the general sense that many scientific findings in a wide variety of fields don’t replicate, and that the published literature has a very high rate of false-positives in it. So if a large proportion of our published results aren’t replicable, and are potential false positives, are we actually accumulating knowledge about real phenomena? I would suggest that the answer

Challenges Lack of documentation of the workflow Lack of transparency across the workflow Lack of discoverability, especially unpublished work Hard to recover the context of experiments

What do we do about it?

ND’s efforts to promote Open Science DASPOS – Data and Software Preservation for Open Science National Data Service Collaboration on Open Science Framework with the Center for Open Science Series of Workshops

DASPOS Project www.daspos.org

DASPOS Data And Software Preservation for Open Science multi-disciplinary effort funded by NSF Notre Dame, Chicago, UIUC, Washington, Nebraska, NYU, (Fermilab, BNL) Links HEP effort (DPHEP + experiments) to Biology, Astrophysics, Digital Curation includes physicists, digital librarians, computer scientists aims to achieve some commonality across disciplines in meta-data descriptions of archived data What’s in the data, how can it be used? computational description (ontology development) how was the data processed? can computation replication be automated? impact of access policies on preservation infrastructure

Digital Librarian Expertise Computer Science Expertise How to catalogue and share data How to curate and archive large digital collections Ontology/Metadata expertise Computer Science Expertise How to build databases and query infrastructure How to preserve software and functionality How to develop distributed storage networks Particle Physics and other Science Expertise What does the data mean? How was it processed? How will it be re-used

Reproducibility defined Reproducibility - the ability to independently come to the same scientific conclusions as another researcher, potentially using different data sets or different methods. Based on: “Reproducible Research,” Comput. Sci. Eng., vol. 12, no. 5, pp. 8–13, Sep. 2010.

Curation Challenge

Workshop Goals Identify opportunities and challenges with using containers to preserve science through bringing together… Computer scientists, librarians and domain scientists… We believe we can do a lot together to support science integrity and open science efforts… knowing that… Reproducibility is not about technology only.