Managing your research data Ben Taylorson Academic Liaison Librarian
Outline Why manage data? What is data management? Data life cycle Putting together a plan Actively managing data Metadata Backups Versions Storing and sharing
What are data? Research data, unlike other types of information, is collected, observed, or created, for purposes of analysis to produce original research results. Qualitative or quantitative Analogue or digital – both have challenges
Reasons to manage your data Responsible conduct of research Funding body grant requirements Research integrity and replication Increase research efficiency Save time and resources Enhance data security Prevent duplication of effort by enabling others to use your data
Climategate 1,000 private emails and many other documents were stolen or leaked from the University of East Anglia's (UEA) Climatic Research Unit (CRU) in November 2009 While HoC Select Committee cleared them of scientific failings, it did find room for improvement in research practices.
Activity What is data management? Think about A definition Elements of data management Questions/issues
A definition “ actively managing data for as long as it continues to be of scholarly, scientific, research and/or administrative interest […] managing it from its point of creation until it is determined not to be useful, and ensuring its long-term accessibility and preservation, authenticity and integrity. Adapted from Digital Curation Centre definition for digital curation It is not just archiving or preservation “”
What is data management? Planning Creation Processing Describing, archiving and organisation Analysing Preservation and security Access and reuse Ethics and privacy Disposal
Data life cycle www.data-archive.ac.uk/create-manage/life-cycle creating processing analysing preserving giving access reusing Data life cycle www.data-archive.ac.uk/create-manage/life-cycle
Specific Plans ICPSR Framework www.icpsr.umich.edu/icpsrweb/content/ICPSR/dmp/framework.html Digital Curation Centre Data Management Plan www.dcc.ac.uk/dmponline Individual institutions e.g. Oxford www.admin.ox.ac.uk/rdm/dmp/plans/ and MIT http://libraries.mit.edu/guides/subjects/data-management/
Creating data Types of data e.g. text, numerical, models, multimedia, software Format e.g. Word or PDF, XML or Excel? Consider longevity and choose open formats How much data will you produce? How will you document it? Will the data change or be updated? Tracking? Will it be reproducible? What if it was lost?
Metadata Accurately describing your data so you can find and understand it again efficiently others can reuse your data easily Descriptive, administrative, structural Basic – files and folders in Windows Complex – XML, Dublin Core Where will this be stored? With the data? Will you need additional storage/software?
Basic metadata Flickr For own use Project level descriptor then breakdown into useful groupings Unique element including date PhD\Primary Research\Interviews\phase 1\ Government officials\Highlevel\MrSmith15062011.mp3 PhD\Primary Research\Interviews\phase 1\ Government officials\Highlevel\ MrSmith15062011.docx Flickr
More complex metadata Dates Funders Language Location Rights List of file names and relationships Formats Methodology
More complex metadata Workflows Sources Versions Checksums Explanation of codes used in file names List of codes used in files Store metadata in a text file (such as a readme file or codebook) in the same directory as the data
Version control Will you retain originals or overwrite as you go? Will anyone else be editing the information and do you need to track these changes? Need to consider this before deciding on naming conventions
Storage Short term Think about volume of data Which media you will use do you need something more than DVD/portable hard drive Security Cost
Backups Make 3 copies which are geographical distributed (original + external/local + external/remote) ITS will do much of this for you but what if remote from Durham? How frequently? Analogue data Consider digitising if unique
Preservation Long-term, more strategic Selection criteria Time-scale – how long will it be saved for? Disposal Additional information necessary for deposit? Does it need to be migrated? Where will it be deposited? Will they manage it for you?
Where to preserve your data UK Data Archive Archaeology Data Service, History DS, Economic and Social DS, Oxford Text Archive No one repository at Durham University for data, only outputs; speak to Sebastian Palucha at Main Library
Sharing Will you share it? Are you obliged to share it? Who will be interested in it? How might they use it? Are there reasons not to fully disclose data? How will it be accessed? When will you make it available? Embargo? Will you publish findings that rely on the data? Consider FOI http://foiresearchdata.jiscpress.org/
Dissemination Deposit in a specialist data centre, dedicated to archiving digital data Submitting to a journal (may be required) Deposit in a self-archiving system or an institutional repository Via a project or institutional website Informally on a peer-to-peer basis e.g. email
Activity Thinking about the data life cycle, look at the ICPSR guidance Try and fill in some of the sections of the DCC Data Management Plan Have you identified any areas on which you will need to seek further advice?
Sources of guidance Durham University UK Data Archive (Social Sciences and Humanities) Create and Manage Data Digital Curation Centre Research Information Network Funders’ web sites
Conclusions Good data management = good research practice Needs management throughout its life cycle Planning helpful and possibly a requirement of funders Depositing data for preservation and access Slides available at www.dur.ac.uk/library/research/