Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr Joanna Goodger Information Hertfordshire With Bill Worthington, Sara Hajnassiri, and Mohamed Hansraj Research Data Management For Researchers.

Similar presentations


Presentation on theme: "Dr Joanna Goodger Information Hertfordshire With Bill Worthington, Sara Hajnassiri, and Mohamed Hansraj Research Data Management For Researchers."— Presentation transcript:

1 Dr Joanna Goodger Information Hertfordshire With Bill Worthington, Sara Hajnassiri, and Mohamed Hansraj Research Data Management For Researchers

2 LETS GET GOING Research Data Management

3 Research Data Management Decisions making In this module, we’ll discuss how best to set up your research: Filing systems; naming, formats, and versioning Metadata; what to include and how Software; longevity and stability Documentation; logs, instructions, and records Coding for the future Getting Started with Research Data Management

4 Research Data Management Why do it now? The end point of all projects involves making the data publicly available. Many data will be deposited in national archives which have regulations for files and metadata. Thinking about the requirements at the beginning of the project will limit the transformations needed at the end of the project. If your file formats have a low risk of obsolescence, is free and openly available then you’re on the path to long-lived files, but you should also consider degradation, compression, and the fidelity of your data. Getting Started with Research Data Management

5 FILING SYSTEMS Research Data Management

6 Filing is more than saving files, it’s making sure you can find them later in your project. Naming Directory Structure File Types Versioning All these help to keep your data safe and accessible. Research Data Management Filing Systems My Project Getting Started with Research Data Management

7 Research Data Management Activity What is data? What does data mean to you? Spend a couple of minutes thinking about what data you will be working with, throughout your project. Then we’ll combine your ideas and compare them between disciplines. Getting Started with Research Data Management

8 Research Data Management Naming Conventions What’s in a name? Creating systematic names can be as simple as assigning a prefix or a number to each object in which case they are a type of numbering scheme. Using a naming convention means that you can distinguish similar records from one another at a glance. You can combine information to form logical file names, changing sections of it to reflect the differences between the files. Getting Started with Research Data Management

9 Research Data Management File formats The formats most likely to be accessible in the future are: non-proprietary in an open, documented standard commonly used by the research community in a standard representation e.g. ASCII, Unicode unencrypted and uncompressed Getting Started with Research Data Management

10 Tables Code Plots Transcripts Audio-Visual Images / Photos Research Data Management File formats Getting Started with Research Data Management

11 Images Raw, Processed, Plotted, Photos, Scans, CAD FITS, JPG, PNG, BMP, PS Reuse, paper, talk, poster, archive, web Use, size, longevity Tables Catalogues, Query results, Calculations, Measurements Text files, FITS, spread sheets Code input, spectra, plot, paper, CDS Use, metadata, accessibility Source code Models, simulations, scripts, inputs, outputs, instructions.c,.pl,.py,.idl, README, Make file, input, output Third party edit, run. paper, web User friendly; functions, size Interviews Audio, Video, Written Transcript.txt,.odt,.doc., mp3,.mp4,.avi Producing transcripts, further analysis Format, longevity, security, metadata Formats Uses Considerations Research Data Management File formats Getting Started with Research Data Management

12 Research Data Management File formats Examples of preferred format choices: PDF/A, not Word ASCII, not Excel MPEG-4, not QuickTime TIFF or JPEG2000, not GIF or JPG XML or RDF, not RDBMS When considering the best file formats for your data, you should think about cross- platform formats and the simplest forms Getting Started with Research Data Management

13 Research Data Management File sizes The format you choose will also affect the compression of your data and how much storage space you’re going to need to keep your data safe and accessible. Consider a 5 Megapixel image. The table below gives the size of that file in different standard formats. You can see what a difference your format makes to your storage requirements. You should think about which is best for your outputs: For the RDM website, resizing the image saves space and prevents the image becoming distorted by compression by the browser. JPGJPG resized (1024 x 776) PNGBMPTIFFPDF 1.5 MB0.2 MB9.0 MB15.0 MB3.0 MB0.8 MB Getting Started with Research Data Management

14 Research Data Management Versioning Keep editing under control Whether you’re working on developing software or writing a document, keeping track of changes made by you and your collaborators is a useful tool as you can check that issues have been addressed and mistakes can be undone. Some software will automatically control your versions, while others require you to ‘Save As’ for a new version – every day or every time changes are made. Cloud storage facilities such as LiveDrive and RackSpace as well as the UH Document Management System (DMS) lock documents while they are being edited so you cannot work on the same file as others preventing overwriting. Getting Started with Research Data Management

15 METADATA Research Data Management

16 What is metadata? Metadata is additional information that is required to make sense of your files – it’s data about data. This is not a new idea; consider your music or film collection; At least the title, authors, release date, producers, directors, etc. Maybe the artwork, the studio, or the format it was released in such as LP (shown left), tape, CD, MD, Video, super 8, DVD, Blu-ray, 3D, etc. All this information is metadata and allows you to make sense of the data and search the collection for the track that you're looking for. Research Data Management Data metadata Getting Started with Research Data Management

17 Research Data Management Data metadata How will you capture addition information? Music and Video embed a lot of information; Getting Started with Research Data Management File Info displayed using WinAmp

18 You need to consider; What contextual details are needed? e.g. a description of the capture methods and data analysis. How will you capture addition information? e.g. in papers, in a database, in a ‘readme’ text file, in file properties/headers. Which standards will you use and why? Data centre recommendations for metadata, controlled vocabularies, and required documentation. Whether there any encoding guidelines you should follow? Research Data Management Data metadata Getting Started with Research Data Management

19 Research Data Management Data metadata What contextual details are needed? Without additional information we do not know Who is in this picture? When was it taken? Where are they? Who took this photo? How was this picture taken? All this information puts this image in context. Without it, it could be photo taken in the 1800s of Mr and Mrs Straus who died on the Titanic, or a Photoshop adjusted image of a young couple dressing up at Brighton pier in 2005. Without additional information we just don’t know. Getting Started with Research Data Management

20 Research Data Management Data metadata How will you capture addition information? Many of the analysis and develop details will be in your published work – journal papers, conference proceeding, or articles for example – but if your data is separated from this publication, can others make sense of it? If you have a results table or database, you should ensure that metadata is provided for each column and/or row You need to record instructions for use for any software developed Your images need to have the required properties, which can be automatically attached or can you add more information manually Getting Started with Research Data Management

21 Research Data Management Data metadata Which standards will you use and why? Many data centres recommend particular metadata for the formats that they support. This may be controlled vocabularies or required documentation. Are you require to deposit in a particular data centre? Are there any encoding guidelines you should follow? Across the board, the standard set of metadata for data files is generally of the form: Title, author, file type, size, format, version, date created, date modified, and software. Datasets also have standard metadata that describes the data collection. Getting Started with Research Data Management

22 SOFTWARE Research Data Management

23 Research Data Management Software Getting Started with Research Data Management When choosing software; Is it unique to your equipment? Stable or under development? Free to use? Available on multiple operating systems? Is it licensed? Does it produce isolated formats? Is it backwards compatible?

24 Whether planned or not, obsolescence affects software which will affect the longevity of your data if produced or stored in a format specific to the software. Technical or functional obsolescence If your equipment that has a limited life expectancy, the software may be short lived. store your data in the native format AND in a re-useable, standardised format use stable, open software for your analysis were possible Research Data Management Software Obsolescence Getting Started with Research Data Management

25 Whether planned or not, obsolescence affects software which will affect the longevity of your data if produced or stored in a format specific to the software. Systematic obsolescence Technology evolves, the demand on software increase, and new editions are release. previous documents may not be compatible with new editions save data in an open format use free, stable software for your analysis Research Data Management Software Obsolescence Getting Started with Research Data Management

26 Research Data Management Software Getting Started with Research Data Management It may be that your collaborators use different operating systems to you. Just because it works on Windows, doesn’t mean it works on Linux. Check if there are suitable software for your colleagues to access your data. Try and use free, open source options where possible. WindowsLinux Apple Mac.

27 DOCUMENTATION Research Data Management

28 Research Data Management Lab Books Why keep a Lab Book? Records are important for development and writing up of your research. You should keep a lab book of your research. a complete reconstruction of the experiment or measurement can be redone later the work can be repeated for re-evaluation of the reported results steps that led to the success or failure of a large project can be extracted patent lawyers need properly documented evidence of inventions Getting Started with Research Data Management

29 Paper lab books are at risk of loss or damage, and cannot be easily searched. An electronic lab notebook (ELN) is a computer program designed to replace paper lab books; easier to search upon, simplify data copying and backups, and support collaboration Research Data Management Lab Books Getting Started with Research Data Management

30 Research Data Management Lab Books A good log should include: Steps and procedures and precautions which are not obvious References to other people's work, ideas, hints, and inputs Parameters which might affect the outcome of the experiment Equipment used, type numbers, serial numbers, any calibration steps taken Sketches of experimental layout and traces on recorders, oscilloscopes, etc. The date and time, names of other people observing Rough error analyses taken during the experiment, repeat observations of doubtful readings, calibration errors allowed for Getting Started with Research Data Management

31 Research Data Management Software Documentation A piece of code without adequate documentation cannot be efficiently or effectively developed, nor can it be understood by users in the future. Documentation comes in many forms: Requirements – statements that identify attributes, capabilities, characteristics, or qualities of a system Architecture – an overview of the software, its purpose and its relations to an environment Technical – the algorithms, interfaces, and APIs End User – manual for end users, system administrators, and support staff Marketing – how to market the product and analysis of the market demand Getting Started with Research Data Management

32 Research Data Management Software Documentation In a research project lifecycle, these documentation forms are appropriate to different stages from the initial development, using the software for analysis, publishing the development and results of your research, and reuse by others later. Requirements – statements that identify attributes, capabilities, characteristics, or qualities of a system : Using Architecture – an overview of the software, its purpose and its relations to an environment : Using and Writing Up Technical – the algorithms, interfaces, and APIs : Writing Up End User – manual for end users, system administrators, and support staff : Using Marketing – how to market the product and analysis of the market demand : Reuse Getting Started with Research Data Management

33 CODING Research Data Management

34 Research Data Management Coding When writing software or analytical code it is important that others and your future self can understand what the code is doing. Wilson et al. (2013) published 10 steps that they regard as the “Best Practices for Scientific Computing” and we agree. “As scientists are never taught how to build software many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists’ productivity and the reliability of their software.” http://arxiv.org/pdf/1210.0530v3.pdf Getting Started with Research Data Management

35 Research Data Management Best Practice Coding 1. Write programs for people, not computers A program should not require its readers to hold more than a handful of facts in memory at once. Names should be consistent, distinctive, and meaningful Code style and formatting should be consistent All aspects of software development should be broken down into tasks, roughly an hour long (50-200 lines of code) Wilson et al. (2013) Getting Started with Research Data Management

36 Research Data Management Best Practice Coding 2. Automate repetitive tasks Rely on the computer to repeat tasks Save recent commands in a file for reuse – this could be as simple as using MAKE. Use a build tool to automate your scientific workflows 3. Use the computer to record history Software tools should be used to track computational work automatically It is already possible to record the: Unique identifiers and version numbers for raw data records, programs and libraries Names and version numbers of programs and the values of parameters used to generate any given output Wilson et al. (2013) Getting Started with Research Data Management

37 Research Data Management Best Practice Coding 4. Make incremental changes Work in small steps with frequent feedback and course correction At each stage of this incomplete code, check that it is working correctly 5. Use version control Keeping alterations in successive versions means that data can be reverted and it can collaboratively developed. Use a standard version control system (VCS) Everything that has been created manually should be put in version control Wilson et al. (2013) Getting Started with Research Data Management

38 Research Data Management Best Practice Coding Wilson et al. (2013) 6. Don’t repeat yourself (or others) Programmers will use the DRY principal to avoid repeating analysing data, and rewriting code; Every piece of data must have a single authoritative representation in the system At small scales, code should be modularized rather than copied and pasted At large scales, re-use code instead of rewriting it Getting Started with Research Data Management

39 Research Data Management Best Practice Coding Wilson et al. (2013) 7. Plan for mistakes - they’re inevitable Defensive programming - add assertions to programs to check their operation They ensure that if something goes wrong, the program halts immediately, which aids debugging and they are also executable documentation i.e. the explain the program as well as checking its behaviour Automated Testing - check to make sure that a single unit of code is returning correct results, or that the behaviour of a program hasn’t changed Use an off-the-shelf unit testing library to initialize inputs, run tests, and report their results in a uniform way Getting Started with Research Data Management

40 Research Data Management Best Practice Coding Wilson et al. (2013) 7. Plan for mistakes (they’re inevitable) Use a variety of oracles - tells a developer how a program should behave or what its output should be In research this includes analytical results, experimental results, and previous results from other tried and tested software. Turn bugs into test cases - write tests that trigger the bug and will prevent that bug from reappearing later Use a symbolic debugger, which allows you to pause a program, inspect the variable values, and move up and down the code to find the problem Getting Started with Research Data Management

41 Research Data Management Best Practice Coding Wilson et al. (2013) 8. Optimize software only after it works correctly In most cases, the most productive way of optimizing code is to get it working correctly, then identify areas that can be sped up. Use a profiler to identify bottlenecks in your code Write code in the highest-level language possible – you can always shift to a low- level language (like C or Fortran) if the performance boost is needed 9. Document design and purpose, not mechanics refactor code instead of explaining how it works, i.e. rather than write a paragraph to explain a complex piece of code, reorganize it so that its self-explanatory embed the documentation for a piece of software in that software Getting Started with Research Data Management

42 Research Data Management Best Practice Coding Wilson et al. (2013) 10. Collaborate code reviews are the most cost-effective way of finding bugs in code use pair programming when bringing someone new up to speed and when tackling particularly tricky problems – one developer writes the code which the other provides real-time feedback In larger teams of developers, use an issue tracking toll to maintain a list of tasks to be performed and bugs to be fixed Getting Started with Research Data Management


Download ppt "Dr Joanna Goodger Information Hertfordshire With Bill Worthington, Sara Hajnassiri, and Mohamed Hansraj Research Data Management For Researchers."

Similar presentations


Ads by Google