DMPonline Adaption of template Sacha Zurcher & Stine Vejlebo Hansen RUb
Introduction to Data Management Plan A data management plan (DMP) is a document that outlines how you will handle your research data both during your research, and after the project is completed. It describes the data that are being created, processed or collected in your research, and how they are treated. The goal of a DMP is to consider the many aspects of data management, metadata generation, data preservation, and analysis before the project begins. This ensures that data are well-managed in the present, and prepared for preservation in the future. A DMP can also help to identify areas where there may be potential problems that should be addressed in cooperation with your research group, collaborating partners, or colleagues. It is important to bear in mind that a DMP is not a fixed document but should be regularly updated during the project. The DMP is divided into 6 sections, each representing one part of the research life cycle. Each section comes with an introduction, and a guidance to the questions. The DMP starts with some basic information about your project, followed by sections on: data collection; documentation & metadata; ethics & legal compliance; storage & backup; selection, preservation & sharing; and responsibilities & resources. Not all questions are equally relevant for all areas of research. A DMP is easier to complete if you are familiar with the Danish Code of Conduct: http://ufm.dk/publikationer/2014/the-danish-code-of-conduct-for-research-integrity
Plan details Please fill out the basic project details below and click 'Update' to save (tilføjelser er skrevet med rød) Plan name Project ID Grant number Principal Investigator/Researcher Principal Investigator/ ID of Researcher (help text should also include ISI researcher ID, SCOPUS ID etc.) Principal Data Plan contact Organizations involved Project duration Description Date of first version Date of last update (can this be generated automatically?) Related policies (questions as formulated by SB)
Data Collection Intro Research data is data that is collected, observed, or created, for purposes of analysis to produce original research results. Research data can be generated for different purposes and through different processes, and can be divided into different categories like observational data, experimental data, simulation data, derived or compiled data. The category(ies) your data comes from will affect the choices that you make throughout your data management plan.
Data Collection Data that will be collected or created Guidance to questions: Here you can give a short description of the data you will create, or third-party data you plan to reuse. Consider the types of data your research will generate, for example qualitative survey data, computational models, statistics, measurements, text, images, or audiovisual data. Also consider the longevity of the file formats that you choose. Using open standards so your data can be read by a variety of software makes it easier to preserve and share with others. What type of data will be collected? Which file formats are the data in? How much data will be generated? 0-5 GB 5 GB – 50 GB 50 GB – 500 GB 500 GB – 2 TB > 2 TB 4. If you are using non digital data, how are you storing those data?
Data Collection How the data will be collected or created Guidance to questions: Outline how the data will be collected/created and which community data standards (if any) will be used. Consider how the data will be organised during the project, mentioning for example naming conventions, version control and folder structures. Explain how the consistency and quality of data collection will be controlled and documented. This may include processes such as calibration, repeat samples or measurements, standardised data capture or recording, data entry validation, peer review of data or representation with controlled vocabularies. What standards will you use? Which methodology will you use? How will you structure and name your folders and files? How will you handle versioning? What quality assurance processes will you adopt?
Documentation and Metadata Intro Producing good documentation and metadata provides context for your data, and makes it easier to find and use in the long term. The amount of effort put into documenting your data will depend on the intended lifespan and how broadly you intend to share it. You may want to capture details about what instrumentation has been used and how that has been calibrated, full variable and value labels, and details about your methodology. Some of this information may be captured already in lab notebooks, project documents or research papers.
Documentation and Metadata Guidance to questions: Metadata should be created to describe the data and aid discovery. Consider how you will capture this information and where it will be recorded e.g. in a database with links to each item, in a ‘readme’ text file, in file headers etc. Researchers are strongly encouraged to use community standards to describe and structure data, where these are in place. How will you capture / create the metadata? Can any of this information be created automatically? What metadata standards will you use?
Documentation and Metadata Data Documentation Guidance to questions: Describe the types of documentation that will accompany the data to provide secondary users with any necessary details to prevent misuse, misinterpretation or confusion. This may include information on the methodology used to collect the data, analytical and procedural information, definitions of variables, units of measurement, any assumptions made, the format and file type of the data. What metadata, documentation or other supporting material should accompany the data for it to be interpreted correctly? What information needs to be retained to enable the data to be read and interpreted in the future?
Ethics and legal compliance Intro Consider whether you will be creating any sensitive data that need to be kept confidential to protect participants or to safeguard Intellectual Property. If so, what procedures will you follow, for example obtaining permissions via consent agreements and data licenses, or protecting information via anonymization, restricted access and embargo periods?
Ethics and legal compliance Guidance to questions: Investigators carrying out research involving human participants must ensure that consent is obtained to allow data to be shared and reused. Managing ethical concerns may include: anonymization of data; referral to departmental or institutional ethics committees; and formal consent agreements. Ethical issues may affect how you store data, who can see/use it and how long it is kept. Have you gained consent for data preservation and (possibly) sharing? How will sensitive data be handled to ensure it is stored and transferred securely? How will you protect the identity of participants?
Ethics and legal compliance Copyright and Intellectual Property Rights (IPR) Guidance to questions: State who will own the copyright and IPR of any new data that you will generate. For multi-partner projects, IPR ownership may be worth covering in a consortium agreement. If purchasing or reusing existing data sources, consider how the permissions granted to you affect licensing decisions. Outline any restrictions needed on data sharing e.g. to protect proprietary or patentable data. Who owns the data? If the data is suitable for reuse. How will they be licensed? If you are using third-party data, how do the permissions you have been granted affect licensing? Will data sharing be postponed / restricted e.g. to seek patents?
Storage and backup Intro While conducting research, but also after completion of the research, it is worthwhile reflecting on the way you want to preserve your data. Preservation of data can be necessary in regards to documentation, e.g. for the use of validation or peer review, but also when you plan to reuse your data, or allow others to benefit from your data. Depending on the type, amount and intended use of data, special infrastructures might be required for storing and sharing data.
Storage and backup Storage and backup during the research Guidance to questions: Describe where and how the data will be stored and backed up to ensure that data and metadata are securely stored during the lifetime of the project. Storing data on laptops, computer hard drives or external storage devices alone is very risky. The use of robust, managed storage provided by university IT teams is preferable. Similarly, it is normally better to use automatic backup services provided by IT Services than rely on manual processes. If you choose to use a third-party service, you should ensure that this does not conflict with any funder, institutional, departmental or group policies, for example in terms of the legal jurisdiction in which data are held or the protection of sensitive data. Where will the data be stored? Do you have access to enough storage or will you need to include charges for additional services? How will the data be backed up? Who will be responsible for backup and recovery? How will the data be recovered in the event of an incident?
Storage and backup Access and security Guidance to questions: If your data is confidential (e.g. personal data not already in the public domain, confidential information or trade secrets), you should outline any appropriate security measures and note any formal standards that you will comply with e.g. ISO 27001. What are the risks to data security and how will these be managed? How will you control access to keep the data secure? How will you ensure that collaborators can access your data securely? If creating or collecting data in the field how will you ensure its safe transfer into your main secured systems? How is security for sensitive data guaranteed?
Selection, preservation and sharing Intro There may be a risk of data getting lost, e.g. once the research is finished or if the researcher who collected the data leaves the institution. Here you can explain your long-term plans for the data. Consider what needs to be preserved and whether that can be shared or not. This will have a bearing on what data repositories or other services you use. Consider which data are of long-term value and should be retained. You should keep the data needed to validate any published findings, and may wish to keep other data that could be reused. How long do the data need to be kept? Check if you are subject to legal obligations or expectations from your funder.
Selection, preservation and sharing Data with long-term value Guidance to questions: Decide which data to keep and for how long. This could be based on any obligations to retain certain data, the potential reuse value, what is economically viable to keep, and any additional effort required to prepare the data for data sharing and preservation. Remember to consider any additional effort required to prepare the data for sharing and preservation, such as changing file formats. What data should be retained? What data must be retained for contractual, legal, or regulatory purposes? What data must be destroyed for contractual, legal, or regulatory purposes? How will you decide what other data to keep? What are the foreseeable research uses for the data?
Selection, preservation and sharing Long term preservation of dataset(s) Guidance to questions: Consider how datasets that have long-term value will be preserved and curated beyond the lifetime of the grant. Also outline the plans for preparing and documenting data for sharing and archiving. If you do not propose to use an established repository, the DMP should demonstrate that resources and systems will be in place to enable the data to be curated effectively beyond the lifetime of the grant. Where e.g. in which repository or archive will the data be held? How long will the data be retained and preserved? Have you costed in time and effort to prepare the data for sharing / preservation? What costs if any will your selected data repository or archive charge?
Selection, preservation and sharing Data sharing Guidance to questions: Consider where, how, and to whom data should be made available. The methods used to share data will be dependent on a number of factors such as the type, size, complexity and sensitivity of data. Consider how people might acknowledge the reuse of your data (e.g. via citations) so you gain impact. Can you share your data? Yes / No (If yes please continue, if not, please clarify where it will be stored and preserved) Which data will be shared and how? With whom will you share the data, and under what conditions? Will you share data via a repository, handle requests directly or use another mechanism? Will you pursue getting a persistent identifier for your data? When will you make the data available? How will potential users find out about your data?
Selection, preservation and sharing Restrictions on data sharing Guidance to questions: Outline any expected difficulties in sharing data, along with causes and possible measures to overcome these. Restrictions may be due to confidentiality, lack of consent agreements or IPR, for example. Consider whether a non-disclosure agreement would give sufficient protection for confidential data. Are there restrictions on data sharing? Yes / No (If yes please continue) What action will you take to overcome or minimize restrictions? For how long do you need exclusive use of the data? Will a data sharing agreement (or equivalent) be required?
Responsibilities and Resources Intro Outline the roles and responsibilities for all activities in the research data life cycle. Consider who will be responsible for ensuring relevant policies will be respected. For collaborative projects you should explain the co-ordination of data management responsibilities across partners. Carefully consider any resources needed to deliver the plan, e.g. software, hardware, technical expertise, etc. Where dedicated resources are needed, these should be outlined and justified.
Responsibilities and Resources Guidance to questions: Outline the roles and responsibilities for all activities e.g. data capture, metadata production, data quality, storage and backup, data archiving & data sharing. Consider who will be responsible for ensuring relevant policies will be respected. Individuals should be named where possible. Who is responsible for implementing the DMP, and ensure it is reviewed and revised? Who will be responsible for each data management activity? How will responsibilities be split across partner sites in collaborative research projects? Will data ownership and responsibilities for RDM be part of any consortium agreement or contract agreed between partners?
Responsibilities and Resources Guidance to questions: Carefully consider any resources needed to deliver the plan, e.g. software, hardware, technical expertise, etc. These costs can usually be written into grant applications but need to be clearly outlined and justified. Is additional specialist expertise (or training for existing staff) required? Do you require hardware or software which is additional or exceptional to existing institutional provision? Will charges be applied by data repositories? Yes / No If charges are applied, how you are going to cover for the costs?