What is data and why should you care? Dr. Kalpana Shankar School of Information and Library Studies, UCD 5 November 2012
What do Apollo 11, the Domesday Project, and award winning scientists from the US National Science Foundation have in common?
What is research data? The data, records, files or other evidence, irrespective of their content or form (e.g. in print, digital, physical or other forms), that comprise research observations, findings or outcomes, including primary materials and analysed data. – Australian National Data Service Examples: Statistics and measurements Results of experiments or simulations Observations e.g. fieldwork Survey results – print or online Interview recordings and transcripts Images, from cameras and scientific equipment
What is data? Any information you use in your research
PhD students lose material all the time…and they are exactly the people who want to be backing up. These are people who are creating data which are life and death important to them Why are we talking about data management? The whole thing is incredibly dull.
Rising volume and complexity of research data According to the European Bioinformatics Institute, the volume of new biological data is doubling every 5 months For example, in genomics: – we can now analyse the equivalent of a human genome every 14 minutes at a cost of $5, times quicker than when the draft human genome was first published in – 1,000 Genomes Project: 200 terabytes the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs
A hard drive after 6 years research 113 Gb 42,699 Files 3,466 Folders Image by Lindsay Lloyd-Smith
So, why is data management important for research? It is increasingly integral to all areas of research It is a rapidly escalating issue It is important to research funders – likely to be increased follow-up in the future It has major resource implications – which need to be planned for carefully In short, it creates major challenges which arent going to go away!
Fire by andrewmalone via flickr.: What would happen to your data if there was a fire or theft in your office, department or home? Why data management is important to YOU (II)
Writing a Data Management Plan 1.Formalises the definition of your research data 2.Documents the contextual and technical details of your data 3.Check on File Structure / Naming 4.Plans for data sharing, access, and archiving
Your Data Management Plan wont be perfect It is not a static document – Change and update it as your research progresses and you understand more about your data Think about key issues that might affect your data… o …while you work on them o …in the future Its better to have a plan that covers some aspects than no plan at all Ask for advice if youre uncertain Getting started
Questions to ask yourself Platform: Windows, Macintosh and/or Unix ? Objective: Store? Manage? Share? Publish? Extent of collaboration –Your research group/lab only –Your group + externals –Cast of thousands? Nature of data? –Level of security? –Human records (de-identified)? –Intellectual Property? Amount of data? MB? GB? TB? –Rate of accumulation of data? –How much needed online to do useful work? –Period of preservation?
Give your data a structure… By Anne (Flickr ID: I like): Voltaire & Rousseau CC BY-NC-ND 2.0 By twechy (Flickr ID): Library Bookshelf CC BY 2.0 …it makes it easier to find things
Something to try: Use post-it notes to create a map of your file structure Write each existing file and folder name onto a post-it Arrange folders on your desk in a sensible hierarchy Put your files into folders Do you need new folders? Do you have too many?
Whats in a name? Names tell us what a file is (contextual information) Use a combination of different types of information to make context and content clear, eg – Author (or Initials) – Date – Data source – Theme – Experiment – Sample …But try not to let file names get too long
Why create documentation? Creating documentation might seem like a waste of time Good documentation will include a lot of information that might seem obvious
Document your data as you go If you dont, it may become impossible for you – or someone else – to understand and re-use data later on Question Mark Sign by Colin_K on flickr: kinner/ /
Whats obvious now might not be in a few months, years, decades… Image: MAKE SURE YOU CAN UNDERSTAND IT LATER Make research material understandable
Make research reproducible Detailing your methodology helps people understand your research better Explaining your algorithms, search methods etc makes your work reproducible Conclusions can be verified Image by woodleywonderworks on flickr:
Material may be re- used by someone in a different discipline Provide context to minimise the risk of it being misunderstood/ misused Make material reusable
Backing up Lots Of Copies Keeps Stuff Safe (LOCKSS): make multiple back-ups Keep back-ups in a separate place to the original Use different types of storage media, eg CDs, pen drives, networked storage, external hard drive From: Copy Copy Copy by David Goehring (CarbonNYC) via flickr
For everything you keep…. Make sure you can: find it again later understand later
Where to get help Earth Institute will be putting up links on Website Your supervisor Library Funding agencies Earth Institute will be putting up links on Website
Oh yes…what do Apollo, the Domesday Project, and award winning scientists from the US National Science Foundation have in common?
Questions? My contact information: – Kalpana Shankar