Short-term storage and data documentation Mari Wigham COMMIT/
This presentation What data should you store? How can you store it? (in the short term) Why document your data? Example: Study on the effects of diet on health Questions
What data should you store? Raw data Final data Papers but also Intermediate data Drafts of papers Methods Equipment and materials Labnotes ...
What data should you store? Everything you need to be able to do your work Everything your colleagues need to do their work Everything required by your funding organisation Everything necessary to reproduce your results
This presentation What data should you store? How can you store it? (in the short term) Why document your data? Example: Study on the effects of diet on health Questions
What is an electronic labnote?
What should you use for electronic labnotes? Smartphone/tablet/laptop/PC... Dedicated e-labnotebook software or standard software What do you want to be able to do with it? ● Take notes? ● Access internet? ● Log on to your network? ● Write documents? ● Give presentations? ● Use in the lab? ● Link up with lab systems?
Electronic lab notes
Short term storage: where? Storage solutions AdvantagesDisadvantagesSuitable for Personal computer /laptop Always available Portable What if it breaks/is stolen? What if you are ill or away? Temporary storage Network drive Managed file servers Regularly backed up and maintained Stored securely Stored centrally Costs May not be accessible from everywhere/by everyone Master copy (if enough space is provided) External storage devices – USB, flash etc. Low cost Portable Easily damaged or lost Insecure Temporary storage Cloud services – Dropbox, Figshare, SkyDrive etc. Automatic sync (some services) Easy access Is it secure? No control over backup procedure Data sharing
Short term storage
Short term storage – what are the issues? Space Access ● From where? ● By who? Versioning Backups Finding it again!
Short term storage: Basic tips Space ● Try to estimate how much you will need ● How will you monitor use? ● What do you do if you need more? ● What is your procedure for deletion? Access ● Think about who will need access and from where ● What is your alternative if there is temporarily no access? ● Does everyone have the same access and edit rights?
Short term storage: basic tips Versioning ● use a file in one (online) location as the “master”, and do all your modifications and processing on copies of that master ● When you have consolidated your changes and do not want to lose them, replace the master file by the consolidated file ● Indicate versions clearly – especially which is the master! ● Use a naming convention that includes date or number (e.g...._v1,..._v2) ● Keep track of ‘milestone files’
Short term storage: Basic tips Backups ● As soon as possible ● Regularly ● How easily can you get hold of the backup? ● Make sure the backup is as independent as possible from the main storage Finding ● Use descriptive names (descriptive for others than just yourself!) ● Document your data
This presentation What data should you store? How can you store it? (in the short term) Why document your data? Example: Study on the effects of diet on health Questions
For yourself For data processing and analysis Help in writing reports and papers Reference for the future ● Will you still understand it in 2 months, 6 months, 2 years..?
For others Provenance and traceability ● Patents ● Fraud ● Accusations of fraud/unethical behaviour Journals are starting to ask for the data behind the paper Research institutes and funding institutions such as the EU and NWO also increasingly want the data Your research colleagues – the ‘lone genius’ is very rare The rest of the scientific world...
...can learn from your successes...
...and your failures
Documentation = paper?
Documentation = software application?
Data documentation Context is essential!
The context comes from you!
This presentation What data should you store? How can you store it? (in the short term) Why document your data? Example: Study on the effects of diet on health Questions
Example Study to examine the effects of diet on health - Conducted over 3 years by 3 researchers – Peter, Lisa and Anna There are many ways to organise the data. We will look at three: - By researcher - By year - By activity
Example It is now the summer holidays in Peter and Anna are on holiday, and Lisa has received some urgent questions from the reviewers. They need to know: the procedure used to produce the high protein diet which bureau measured the data what sort of preprocessing was carried out on the data.
Why don’t we...
Organisation by year/researcher Need to know what was done when or by who
Example – Organising by activity Easy to navigate through, for each question you quickly find the right folder - even if you had no prior knowledge.
Example – Organising by activity Still need to do quite a lot of detective work to find the information – have to rely on good names, guesswork, and......read through the content of the files.
Descriptions and links Enter a brief description for each activity (folder) It may help to identify types of files (e.g. dataset, procedure, sample, document) Linking to items produced in other activities allows you to: ● follow the workflow ● reuse items ● avoid problems due to multiple copies
Example – Organising by activity plus descriptions and links Easy to navigate through, for each question you quickly find the right folder - even if you had no prior knowledge. Descriptions help you to find and understand the data Links make the whole process traceable
...and now we are also better able to...
Lab notes Notes taken in the lab are often unstructured May also cover different projects Splitting the notes per activity helps Can also split the notes into method, samples, data etc. These are more findable and easier to reuse How far you go depends on the time you have and what is necessary for understanding the data The same goes for other large, unstructured files
Documenting data It takes time! But it’s an investment – not time lost
Why document your data? If you store your files in an easily understandable structure with description and links, you can: See your research in context Find – and understand - information more easily Make your research traceable Make your research reusable
Questions?