Writing a successful data management plan Kathleen Fear October 17, 2013
A little about me… Physics to social science – Research of my own in physics, library and information science – Embedded data management research with public health, biomedical research, botany, proteomics, archaeology, social sciences …I just really like data.
What is a DMP? A formal plan outlining how you will handle your data throughout and after your project… …which is now required by many funders… …and which is a good idea anyhow, even if it’s not required.
Goals of this session Learn the basic components of a DMP Understand how good data management practices translate to a good DMP
One size does not fit all… But we’ll cover general guidelines
DMP components Data Products Description and Organization Access and Security Reuse and Sharing Archiving and Preservation
Data Products Describe the kind of data you’re collecting or using, whether it’s digital… …or physical. (or all of the above)
Data Products: What to specify What are your data products, both primary and derived? When will you collect / produce each data product? How much data will you generate? What file types? Open or proprietary?
Description and Organization Describing and organizing your data makes your work easier, and provides context for those you share with
Description & Organization: What to specify Naming standards: – Can you tell what a file is and what it contains without opening it? How do your files relate to one another?
Description & Organization: What to specify Controlled vocabulary: Ontologies and taxonomies
Description & Organization: What to specify Metadata: Contextualizing information about an object, physical or digital Some fields have defined standards; some repositories ask for a specific set of metadata
Metadata Where does it go? Lab notebook, Codebook, readme.txt, XML file
Access and Security Though funders encourage sharing, your DMP should also describe how you will protect and secure your data, both while you are activity using it and after the end of a project
Access and Security: What to specify Backup: – Where? (and what?) Local (hard drive, dept/local server, personal laptop, flash drive) vs. distant (PDC, hard drive at home) Central (PDC, UR Research) vs. cloud (Amazon, Box, CrashPlan, Google Drive) – How often? – Who’s responsible? Security: Locked cabinets? Password-protected computer? Non-networked storage?
Reuse and Sharing
A DMP does NOT: Require that you share all data with anyone who wants it “at no more than incremental cost and within a reasonable time” (NSF) “at no more than incremental cost and within a reasonable time” (NSF) “indicate the criteria for deciding who can receive your data” (NIH) “indicate the criteria for deciding who can receive your data” (NIH)
Reuse and sharing: what to specify What data products will you share freely? When? How? – Data necessary for replication of public results – Other data? What data products won’t you share freely? Why not? Consider restrictions, embargo, etc. for data that can’t be immediately shared freely
Reuse and sharing: licensing Raw data is not copyrightable in the US – CCZero – ODC-PDDL Other materials may be – CC licenses
Archiving and Preservation Plan ahead for what will happen to your data long-term, beyond its current use in your project.
Archiving & Preservation: What to specify Where will you put your data? What will you save and what will you discard? How will you plan for ongoing usability? …format migration? …integrity checking and refreshing? …maintaining security?
Placing data in a repository Long-term commitment to data preservation Higher visibility for your data Permanent URL / DOI enables data citation Reuse tracking and usage statistics
Placing data in a repository UR Research: – Example: STOP-ROP Clinical TrialSTOP-ROP Clinical Trial
Library-hosted 2GB soft limit Backed up, secure Free!
Placing data in a repository UR Research: Repository directories: re3data.org; biosharing.orgre3data.org biosharing.org
Integration with journal submission processes Link to data held elsewhere Not free: $80/submission
Revisiting Metadata and Documentation Information about data processing, collection details: the ‘story’ of the data (…but it’s all in the paper!)…but it’s all in the paper! Are your variable names meaningful? It is clear how different parts of the dataset relate to each other? Is it in a format others can use?
Emphasize key components Data Products Description and Organization Access and Security Reuse and Sharing Archiving and Preservation Example: A survey exploring drug use and abuse among teenagers in a large Midwestern city
Emphasize key components Data Products Description and Organization Access and Security Reuse and Sharing Archiving and Preservation Example: A study of the relationship between air pollution and metabolic syndrome, using existing data
A little help: DMPTool dmptool.org
A little help: UR Data Management website library.rochester.edu/data-management/goals
A little help: consultation Call me! (Or , or drop by.) Carlson 313E DMP consultation & review; trainings; data archiving support; etc.
A request When you get a grant funded, send me your DMP. If you’re comfortable, if you get negative feedback on your DMP, share it with me. Fill out the forthcoming data management survey.
Questions?