Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management Plans for SNSF applications

Similar presentations

Presentation on theme: "Data Management Plans for SNSF applications"— Presentation transcript:

1 Data Management Plans for SNSF applications
Stephan Egli :: IT Project Manager Photon Science Ines Günther-Leopold :: Directorate Support, Science Laura Heyderman :: President of the FoKo Gerd Mann :: Head IT and Head IT Department Knud Thomsen :: Secretary FoKo Mirjam vanDaalen :: Chief of Staff Photon Science Division Data Management Plans for SNSF applications

2 Agenda Time Topic Presenter 11:00 – 11:05 Introduction Laura Heyderman
Gerd Mann 11:05 – 11:25 SNSF Open Research Data Policy Sarah Gerster (SNSF) Cornélia Sommer (SNSF) 11:25 – 11:45 PSI Data Policy and Recommendations for Data Management Plans Mirjam van Daalen 11:45 – 11:55 PSI Data Management Solutions and Survey Gerd Mann 11:55 – 12:10 Data Management and Data Life Cycle at SLS and SwissFEL Stephan Egli 12:10 – 12:30 Q&A Session Knud Thomsen

3 Introduction Starting October 2017 SNF requests Data Management Plans (DMP) as integral part of research proposal DMPs must be submitted together with the proposal via mySNF Web Form DMPs can/must be updated until end of project Additional funds for data management can be requested (up to 10 kCHF) DMPs are not part of the scientific evaluation of a proposal, however SNF will check plausibility of DMPs

4 Introduction With this info event we want to provide information and guidance concerning these new requirements Sarah Gerster and Cornelia Sommer from SNSF joined us today to present SNSF Open Research Data Policy and to answer questions in the last part of todays session

5 Agenda Time Topic Presenter 11:00 – 11:05 Introduction Laura Heyderman
Gerd Mann 11:05 – 11:25 SNSF Open Research Data Policy Sarah Gerster (SNSF) Cornélia Sommer (SNSF) 11:25 – 11:45 PSI Data Policy and Recommendations for Data Management Plans Mirjam van Daalen 11:45 – 11:55 PSI Data Management Solutions and Survey Gerd Mann 11:55 – 12:10 Data Management and Data Life Cycle at SLS and SwissFEL Stephan Egli 12:10 – 12:30 Q&A Session Knud Thomsen

6 Agenda Time Topic Presenter 11:00 – 11:05 Introduction Laura Heyderman
Gerd Mann 11:05 – 11:25 SNSF Open Research Data Policy Sarah Gerster (SNSF) Cornélia Sommer (SNSF) 11:25 – 11:45 PSI Data Policy and Recommendations for Data Management Plans Mirjam van Daalen 11:45 – 11:55 PSI Data Management Solutions and Survey Gerd Mann 11:55 – 12:10 Data Management and Data Life Cycle at SLS and SwissFEL Stephan Egli 12:10 – 12:30 Q&A Session Knud Thomsen

7 Data policy for PSI research data
Paul Scherrer Institut Data policy for PSI research data Mirjam van Daalen

8 Data Policy for PSI Research Data
Within a working group, delegates of all PSI departments developed a data policy for PSI research data. The policy was approved by the DIRK in August 2016. The data policy is available as general directive "Allgemeine Weisung" AW at and is valid as of now.


10 Data Policy for PSI Research Data
This document adheres to the Guidelines Research Integrity at PSI ( ) It is part of the PSI information governance policy framework Informationssicherheit am PSI AW

11 Data policy for PSI research data
Based on the PaNdata Data Policy (Deliverable D2.1. of PaNdata Europe FP7 project in 2011) and on ESRF data policy of November 2015 The policy addresses the issues of: Data ownership Data curation Data archiving Open data access

12 Data policies at other Research Institutes
Neutrons: ILL- PanData data policy since 2012 ISIS-PanData data policy since 2012 Photons: Elettra- PanData data policy since 2013 ESRF-PanData data policy since November 2015 HZB-PanData data policy since June 2016 MAXIV-PanData data policy since 2015 HZDR-PanData data policy since June 2017 ALBA-PanData data policy proposed EUXFEL-PanData data policy proposed DESY-PanData data policy proposed

13 PSI research data policy– main elements
Raw data and associated metadata: PSI departments are the custodian of raw data and metadata PSI will automatically collect metadata for all experiments PSI will store metadata in a metadata catalogue (defined in the DaaS project) The experimental team has sole accesss to the data during the so called embargo period (3 years); request to extend embargo period can be made After the embargo PSI will make the data open access Proprietary data belong by default to the PI and will be removed after the experiment from PSI disk unless otherwise agreed

14 PSI research data policy – main elements
Raw data and associated metadata: High level Metadata such as title, authors, abstract, specific Research Infrastructure will be made public as soon as the experiment has been carried out. This information will be available via the persistent identifier landing page on the web. Raw Data and Metadata explicitly used for peer-reviewed publication will become Open Access at the time of such publication.

15 PSI research data policy – main elements
Data access I Access to raw data and metadata will be via a searchable on-line catalogue (currently developed within the DaaS project) Access to the on-line catalogue of PSI will be restricted to registered users of the on-line catalogue. Access to proposals will only be provided to the experimental team and appropriate facility staff

16 PSI research data policy – main elements
Data access II PI/leader experimental team has the possibility to transfer parts or the totality of her/his rights during the embargo period to another registered person PI/leader experimental team has the right to create and distribute copies of the raw data PI/leader experimental team has the possibility to render data public before the end of the embargo period

17 PSI research data policy – main elements
Data Storage: A minimum of 5 years; PSI will strive for 10 years, depending on the type and volume of data concerned and the economic consequences associated with long-term data storage. Page 17

18 Implementation schedule
Implementation of Policy can start with the availability of metadata Catalogue software that is developed within the Data Analysis as A Service (DAAS) project. Within the DAAS project the attachment of the following 3 beamlines to the meta data catalogue will be realized: TOMCAT, CSAX, MX. The DAAS project ends in October 2017. After the DAAS project the role out to attach the rest of the beamlines will be started. Furthermore the functionality of the metadata catalogue will be extended within the metadata catalogue collaboration with ESS.

19 SNSF Data Management Plan – Questionnaire
Laura Heyderman :: President of the FoKo SNSF Data Management Plan – Questionnaire 09/2017

20 Data Management Plan – my SNF Form
Question Answers Task 1.1 What data will you collect, observe, generate or re-use? - What type, format and volume of data will you collect, observe, generate or reuse? - Which existing data (yours or third-party) will you reuse? For SLS and SwissFel Data format: preferably HDF5 For others Describe you specific approach Provide description for Data type Data format Data volume Data origin Their role in the project Open standards preferred 1.2 How will the data be collected, observed or generated? - What standards, methodologies or quality assurance processes will you use? - How will you organize your files and handle versioning? Data management: Data flow and data management with DaaS and PetaByte Archive Describe your specific approach Describe process including QA aspects (use of standards or internal procedures, naming conventions, data management systems …) 1.3 What documentation and metadata will you provide with the data? - What information is required for users (computer or human) to read and interpret the data in the future? - How will you generate this documentation? - What community standards (if any) will be used to annotate the (meta)data? List attributes of Meta Data Catalogue Describe all types of documents and provide basic instructions for users (metadata standard, generation of metadata, software version …)

21 Data Management Plan – my SNF Form
Question Answers Task 2.1 How will ethical issues be addressed and handled? - What is the relevant protection standard for your data? Are you bound by a confidentiality agreement? - Do you have the necessary permission to obtain, process, preserve and share the data? Have the people whose data you are using been informed or did they give their consent? - What methods will you use to ensure the protection of personal or other sensitive data? Where ethical aspects are relevant describe details on permissions, measures to protect data. Adhere to PSI regulations and refer to them: Research integrity at PSI – Guidelines for good scientific practice Point out concerns if any and how they are addressed 2.2 How will data access and security be managed? - What are the main concerns regarding data security, what are the levels of risk and what measures are in place to handle security risks? - How will you regulate data access rights/permissions to ensure the security of the data? - How will personal or other sensitive data be handled to ensure safe data storage and -transfer? General Raw data and metadata explicitly used for peer-reviewed publications will become Open Access at the time of the publication Access to other data will be granted after embargo period has been expired (in case no legal, regulatory or other restrictions do exist) Access will be managed by prinicpal investigator on request Refer to PSI Data Policy (AW ) Describe process (main concerns: data availability, integrity, confiden-tiality) 2.3 How will you handle copyright and Intellectual Property Rights issues? - Who will be the owner of the data? - Which licenses will be applied to the data? - What restrictions apply to the reuse of third-party data? Primary data derived from research projects undertaken at PSI by PSI users remain the property of PSI unless otherwise agreed on by contractual regulation with external partner (cf. Research integrity at PSI – Guidelines for good scientific practice) Provide description and/or reference to doc.

22 Data Management Plan – my SNF Form
Question Answers Task 3.1 How will your data be stored and backed-up during the research? - What are your storage capacity and where will the data be stored? - What are the back-up procedures? For SLS and SwissFel Raw and derived data will be stored on tape in the PetaByte archive operated by PSI and CSCS. Default is without redundancy. For long term storage the option for redundant storage on a second tape at the same location is provided. Retention periods are 5 or 10 years, respectively. Others Data on managed storage devices will be backed up. Provide reference to installations and/or describe procedures 3.2 What is your data preservation plan? - What procedures would be used to select data to be preserved? - What file formats will be used for preservation? Raw, derived and metadata will be archived Data format: preferably HDF5 Describe specific data formats Give reference to storage provider and/or describe procedures used to select data to be preserved (selection criteria, reusability, costs, definition of responsible person …)

23 Data Management Plan – my SNF Form
Question Answers Task 4.1 How and where will the data be shared? - On which repository do you plan to share your data? - How will potential users find out about your data? General Information on all PSI publications will be available via a Literature DB with a reference to raw and derived data (if applicable) Refer to PSI Data Policy (chapter 5) For SLS and SwissFel The PSI Meta Data Catalogue will provide information on available data and ownership Data will be stored in the PetaByte Archive and be accesible via the Meta Data Catalogue on request Others Describe specific solution Solution provided by research communities (Prio 1) Solutions provided by Journals Solutions provided by PSI (ongoing activities) Proposal: IT and FoKo make recommendation concerning data repository for others (e.g. Dryad, EUDAT, Harward Dataverse, Zenodo [CERN], …)), check re3data ( to select repositories by subject and level of trust 4.2 Are there any necessary limitations to protect sensitive data? - Under which conditions will the data be made available (timing of data release, reason for delay if applicable)? If relevant describe your specific restriction and related conditions

24 Data Management Plan – my SNF Form
Question Answers Task 4.3 I will choose digital repositories that are conform to the FAIR Data Principles. Check box You can find certified repositories under 4.4 I will choose digital repositories maintained by a non-profit organisation. If the answer is no: “Explain why you cannot share your data on a non-commercial digital repository.” RADIO BUTTON yes/no Provide description of problem and/or reference to documentation

25 Agenda Time Topic Presenter 11:00 – 11:05 Introduction Laura Heyderman
Gerd Mann 11:05 – 11:25 SNSF Open Research Data Policy Sarah Gerster (SNSF) Cornélia Sommer (SNSF) 11:25 – 11:45 PSI Data Policy and Recommendations for Data Management Plans Mirjam van Daalen 11:45 – 11:55 PSI Data Management Solutions and Survey Gerd Mann 11:55 – 12:10 Data Management and Data Life Cycle at SLS and SwissFEL Stephan Egli 12:10 – 12:30 Q&A Session Knud Thomsen

26 PSI Data Management Solutions and Survey
Gerd Mann :: Head IT and Head IT Department PSI Data Management Solutions and Survey

27 Provide Tools and Services
Proposal Provide Guidance Provide Information on SNF requirements and inform about PSI data management policy Guideline for Data Management Plans (DMP) Govern Process Observe benchmark good / established practice (ETH,…) The aim is to use a pramatic approach and not to set a ‘gold standard’ Ensure basic compliance with defined rules and processes Provide Tools and Services Management of Metadata, derived data, raw data, code,… SLS, SwissFEL, MEG II Data Analysis As Service with Data Catalogue and PetaByte Archive Domain Specific Community Solutions PSI Solutions (with partners) Roles and responsibilities Process ownership: FoKo Implementation of solutions: AIT

28 IT Solutions Publication Management DaaS Data Analysis as a Service
Publications linked to Data via Digital Object Identifier Publication Management DaaS Data Analysis as a Service Data Catalogue PetaByte Archive Domain Specific Community Solutions PSI Solution (with Partners) Roll out DaaS to all beamlines (SLS, SwissFEL) Perform survey to identify solutions in use and demand Manage catalogue of solutions in use or recommended Evaluate solutions within ETH domain

29 Activities and Timelines
16/08/2017 Presentation DIRK 17/08/2017 Information to PSI scientists (PSI Aktuell [21/08/2017], to all line managers, intranet web page) 07/09/2017 Info event for PSI researchers (Foko & IT) with SNF representative (Q&A session) – 11:00–12:00 or 14:00-15:00 (Auditorium) 30/09/2017 Conduct survey on domain specific solutions at PSI 10/2017 Submit SNF proposal with data management plans 11/2017 Analysis of survey and selection of solutions 12/2017 Presentation of findings and recommended solutions to DIRK Implementation of solutions and roll out DaaS for SLS beamlines

30 Goals & Scope & Opportunities
Overview on existing solutions and needs for action Prepare DOI integration of existing solution Scope: All labs/functions in scope of Open Data (SNF, EU,…) or with other needs for data management Opportunities Leverage existing solutions Minimize number of solutions used at PSI (incl. operational effort and cost)

31 Questionnaire Structure
Do you use already a data management solution fulfilling SNF DMP requirements? Yes No Which solution do you use? Describe Data Types Data Volumes (p.a.) Metadata Do you have any restriction concerning open data (legal, contractual,…) and how do you deal with them? Can you recommend your open data management solution to potential future users? Are you interested to share your experiences to those who are interested to find their solution to provide their own data openly? Contact person? …. Describe requirements concerning: Data Type Data Volumes Metadata Do you have any restriction concerning open data (legal, contractual,…) ? Do you have preferred solutions? Contact person for requirements specification

32 Tool, Conduction and Time Lines
Findmind: Online Web Tool Data in Switzerland Reports generated out of Survey Conduction and Time Lines: with link to Survey Tool will be send to the audience by end of September Survey will be open till end of October Analysis to be done beginning of November Follow-up sessions with contact persons to be contacted till end of November Findings and recommendations for action will be reported to DIRK 12/2017

33 Agenda Time Topic Presenter 11:00 – 11:05 Introduction Laura Heyderman
Gerd Mann 11:05 – 11:25 SNSF Open Research Data Policy Sarah Gerster (SNSF) Cornélia Sommer (SNSF) 11:25 – 11:45 PSI Data Policy and Recommendations for Data Management Plans Mirjam van Daalen 11:45 – 11:55 PSI Data Management Solutions and Survey Gerd Mann 11:55 – 12:10 Data Management and Data Life Cycle at SLS and SwissFEL Stephan Egli 12:10 – 12:30 Q&A Session Knud Thomsen

34 Data Management and Data Life Cycle at SLS and SwissFEL
Stephan Egli:: SYN/LSB :: Paul Scherrer Institut Data Management and Data Life Cycle at SLS and SwissFEL Data Management Plans for SNSF applications Sept 7th 2017

35 Overview Data Lifecycle Management and related Projects
Data Catalogue Purpose Dataset Concept and Access Policy Impact on Beamline Managers (Exp. Responsibles) and Users Metadata Structure GUI Screenshots for data catalog Integration Challenges – A Common Journey

36 Data Lifecycle Management
PSI data policy defines (long term) goals concerning data storage, life cycle management, data access and ownership. Implementation of PSI data policy needs a data catalogue.

37 Data Management related Projects
Data Analysis as a service DaaS: focusses on offline data analysis and large offline disk storage. Finishes end of October 2017 Petabyte Archive: focuses on enabling the data flows to and from a longterm data storage at CSCS/Lugano. Finishes end of 2017. Data Curation Project, collaboration with ESS, focusses on data catalog and data analysis automation. Started this year and will last until end of Enabled to add dedicated manpower for data curation tasks.

38 Data Catalogue Purpose
Manage the meta data of raw and derived data taken at PSIs experiment facilities Meta data administrative : data management lifecycle, ownership, filecatalog scientific: describing the sample, beamline and experiment parameters relevant for the users data analysis Enables management of the lifecycle of the data from creation , data analysis and eventual deletion Data can be linked to proposals and samples Data can be linked to publications (DOI, PID) Data can be migrated to and from longterm storage on tape Helps keeping track of data provenance (i.e. the steps leading to the final results) Allows to check scientific integrity (checksum of data) Allows to find data based on the meta data (your own data and other peoples public data) In the long term: help to automate standardized analysis workflows support the standardization of data formats

39 Dataset Concept and Access Policy
Meta data is linked to Datasets, which are collection of files, e.g. all files produced during a data taking run Each dataset gets a globally unique persistent identifier (PID) Each dataset is uniquely assigned to one pgroup Only members of the pgroup have access to the raw data and meta data belonging to the pgroup Only after the embargo period (typically 3 years) the data becomes public The pgroup membership can be defined via processes supported by the digital user office DUO (Roles: BM, PI, MP) The pgroups are stored centrally in the AD Identity Management system of AIT

40 Impact on Beamline Managers (Exp. Responsibles)
Beamlines will be connected one-by-one individually Each beamline will be connected to the data catalogue via dedicated "ingest" scripts Goal: automate the meta data entry as much as possible Beamline managers should help to define the meta data that should go to the data catalog Beamline managers are invited to give feedback to the use of the data catalogue and come up with ideas and suggestions for new use cases

41 Impact on Users Users will be able to add/link meta data, especially for meta data which can not be created automatically (e.g. sample data) Users will see and find their data via a data catalog GUI Users will see the lifetime of the data on disk connected to the analysis cluster Users will be able to fetch data back from tape which was deleted on disk As long as data is on disk the actual analysis can be done without help from the data catalog (as today)

42 Administrative Data Model

43 Scientific Meta Data Scientific meta data is up to the beamline managers to define (in collaboration with the users) Aim for standardization, e.g. via use of HDF5 and Nexus formats The catalog per se does not pose any limits here See example on next page. . .

44 Example of Scientific Meta Data

45 Where is the data stored ?
The data is stored initially on the online file server (connected to online clusters) Auto-copied to offline file server (connected to offline cluster ’Ra’) and copied to/from tape in Lugano/CSCS

46 GUI Screenshots

47 GUI Screenshots

48 GUI Screenshots

49 GUI Screenshots

50 Integration Challenges

51 A Common Journey Its more a common journey of beamline manager/scientists and IT experts than a one time introduction of a new tool. Constant feedback from scientists wanted and needed WIP:There is still a lot to do and not all questions can be answered already today The tool should help the researchers in their research work by offloading operational duties The system has a flexible architecture which allows to take into account specific needs if they appear (but we have to work within the limits of the available manpower)

52 Wir schaffen Wissen – heute für morgen
My thanks go to Colleagues from AIT DaaS project members Involved Beamline Managers

53 Agenda Time Topic Presenter 11:00 – 11:05 Introduction Laura Heyderman
Gerd Mann 11:05 – 11:25 SNSF Open Research Data Policy Sarah Gerster (SNSF) Cornélia Sommer (SNSF) 11:25 – 11:45 PSI Data Policy and Recommendations for Data Management Plans Mirjam van Daalen 11:45 – 11:55 PSI Data Management Solutions and Survey Gerd Mann 11:55 – 12:10 Data Management and Data Life Cycle at SLS and SwissFEL Stephan Egli 12:10 – 12:30 Q&A Session Knud Thomsen

54 Wir schaffen Wissen – heute für morgen

Download ppt "Data Management Plans for SNSF applications"

Similar presentations

Ads by Google