Dr Joanna Goodger Information Hertfordshire With Bill Worthington, Sara Hajnassiri, and Mohamed Hansraj Research Data Management For Researchers.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Configuration management
Lecture 4 Basic Scripting. Administrative  Files on the website will be posted in pdf for compatibility  Website is now mirrored at:
XHTML Basics.
Test Case Management and Results Tracking System October 2008 D E L I V E R I N G Q U A L I T Y (Short Version)
Chapter 3: Modules, Hierarchy Charts, and Documentation
Tutorial 8: Developing an Excel Application
Tutorial 12: Enhancing Excel with Visual Basic for Applications
Modules, Hierarchy Charts, and Documentation
Understanding the Mainline Logical Flow Through a Program (continued)
CODING Research Data Management. Research Data Management Coding When writing software or analytical code it is important that others and your future.
Software Development Unit 6.
Software Development, Programming, Testing & Implementation.
Creating and publishing accessible course materials Practical advise you can replicate.
Introduction to Database Systems 1.  Assignments – 3 – 9%  Marked Lab – 5 – 10% + 2% (Bonus)  Marked Quiz – 3 – 6%  Mid term exams – 2 – (30%) 15%
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
Introduction to VBA. This is not Introduction to Excel We’re going to assume you have a basic level of familiarity with Excel If you don’t, or you need.
FILING SYSTEMS Research Data Management. Filing is more than saving files, it’s making sure you can find them later in your project. Naming Directory.
This chapter is extracted from Sommerville’s slides. Text book chapter
Software Configuration Management (SCM)
METADATA Research Data Management. What is metadata? Metadata is additional information that is required to make sense of your files – it’s data about.
Unit J: Creating a Database Microsoft Office Illustrated Fundamentals.
1 Shawlands Academy Higher Computing Software Development Unit.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
MULTIMEDIA What is Multimedia? The word MULTIMEDIA is made up from two words, MULTI meaning more than one and MEDIA meaning a way of displaying or passing.
DIGITAL GRAPHICS & ANIMATION Complete LESSON 2 Importing and Exporting Graphics.
Spreadsheet-Based Decision Support Systems Chapter 22:
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
PHP meets MySQL.
DOCUMENTATION Research Data Management. Research Data Management Lab Books Why keep a Lab Book? Records are important for development and writing up of.
Configuration Management (CM)
1 The Software Development Process  Systems analysis  Systems design  Implementation  Testing  Documentation  Evaluation  Maintenance.
SOFTWARE Research Data Management. Research Data Management Software Getting Started with Research Data Management When choosing software; Is it unique.
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
XP 1 Microsoft Access 2003 Introduction To Microsoft Access 2003.
Software Project Planning Defining the Project Writing the Software Specification Planning the Development Stages Testing the Software.
SE: CHAPTER 7 Writing The Program
Getting Started with MATLAB 1. Fundamentals of MATLAB 2. Different Windows of MATLAB 1.
CS 111 – Nov. 22 Chapter 7 Software engineering Systems analysis Commitment –Please read Section 7.4 (only pp ), Sections –Homework #2.
Intermediate 2 Software Development Process. Software You should already know that any computer system is made up of hardware and software. The term hardware.
CS370 Spring 2007 CS 370 Database Systems Lecture 1 Overview of Database Systems.
LETS GET GOING Research Data Management. Research Data Management Decisions making In this module, we’ll discuss how best to set up your research: Filing.
ITGS Databases.
The Software Development Process
How Not to Lose Track of Your Research Organization and Planning Resources at Brandeis Melanie Radik and Raphael Fennimore Library & Technology Services.
HNDIT23082 Lecture 06:Software Maintenance. Reasons for changes Errors in the existing system Changes in requirements Technological advances Legislation.
Application Software System Software.
Intermediate 2 Computing Unit 2 - Software Development.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Oman College of Management and Technology Course – MM Topic 7 Production and Distribution of Multimedia Titles CS/MIS Department.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
Forensic Investigation Techniques Michael Jones. Overview Purpose People Processes Michael Jones2Digital Forensic Investigations.
Adobe Flash Professional CS5 – Illustrated Unit E: Optimizing and Publishing a Movie.
Today… Modularity, or Writing Functions. Winter 2016CISC101 - Prof. McLeod1.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
Scientific data storage: How are computers involved in the following?
OPERATING SYSTEMS (OS) By the end of this lesson you will be able to explain: 1. What an OS is 2. The relationship between the OS & application programs.
Dr Joanna Goodger Information Hertfordshire With Bill Worthington, Sara Hajnassiri, and Mohamed Hansraj Research Data Management For Researchers.
Software Development Languages and Environments. Computer Languages Just as there are many human languages, there are many computer programming languages.
 At the end of the class students should:  distinguish between data and information.  explain the characteristics and forms of Information Processing.
ECS – Storyboarding and Introduction to Web Design
Development Environment
GO! with Microsoft Office 2016
GO! with Microsoft Access 2016
Design and Programming
Unit 6 Assignment 2 Chris Boardley.
CMSC 345 Programming.
Programming Logic and Design Eighth Edition
Presentation transcript:

Dr Joanna Goodger Information Hertfordshire With Bill Worthington, Sara Hajnassiri, and Mohamed Hansraj Research Data Management For Researchers

LETS GET GOING Research Data Management

Research Data Management Decisions making In this module, we’ll discuss how best to set up your research: Filing systems; naming, formats, and versioning Metadata; what to include and how Software; longevity and stability Documentation; logs, instructions, and records Coding for the future Getting Started with Research Data Management

Research Data Management Why do it now? The end point of all projects involves making the data publicly available. Many data will be deposited in national archives which have regulations for files and metadata. Thinking about the requirements at the beginning of the project will limit the transformations needed at the end of the project. If your file formats have a low risk of obsolescence, is free and openly available then you’re on the path to long-lived files, but you should also consider degradation, compression, and the fidelity of your data. Getting Started with Research Data Management

FILING SYSTEMS Research Data Management

Filing is more than saving files, it’s making sure you can find them later in your project. Naming Directory Structure File Types Versioning All these help to keep your data safe and accessible. Research Data Management Filing Systems My Project Getting Started with Research Data Management

Research Data Management Activity What is data? What does data mean to you? Spend a couple of minutes thinking about what data you will be working with, throughout your project. Then we’ll combine your ideas and compare them between disciplines. Getting Started with Research Data Management

Research Data Management Naming Conventions What’s in a name? Creating systematic names can be as simple as assigning a prefix or a number to each object in which case they are a type of numbering scheme. Using a naming convention means that you can distinguish similar records from one another at a glance. You can combine information to form logical file names, changing sections of it to reflect the differences between the files. Getting Started with Research Data Management

Research Data Management File formats The formats most likely to be accessible in the future are: non-proprietary in an open, documented standard commonly used by the research community in a standard representation e.g. ASCII, Unicode unencrypted and uncompressed Getting Started with Research Data Management

Tables Code Plots Transcripts Audio-Visual Images / Photos Research Data Management File formats Getting Started with Research Data Management

Images Raw, Processed, Plotted, Photos, Scans, CAD FITS, JPG, PNG, BMP, PS Reuse, paper, talk, poster, archive, web Use, size, longevity Tables Catalogues, Query results, Calculations, Measurements Text files, FITS, spread sheets Code input, spectra, plot, paper, CDS Use, metadata, accessibility Source code Models, simulations, scripts, inputs, outputs, instructions.c,.pl,.py,.idl, README, Make file, input, output Third party edit, run. paper, web User friendly; functions, size Interviews Audio, Video, Written Transcript.txt,.odt,.doc., mp3,.mp4,.avi Producing transcripts, further analysis Format, longevity, security, metadata Formats Uses Considerations Research Data Management File formats Getting Started with Research Data Management

Research Data Management File formats Examples of preferred format choices: PDF/A, not Word ASCII, not Excel MPEG-4, not QuickTime TIFF or JPEG2000, not GIF or JPG XML or RDF, not RDBMS When considering the best file formats for your data, you should think about cross- platform formats and the simplest forms Getting Started with Research Data Management

Research Data Management File sizes The format you choose will also affect the compression of your data and how much storage space you’re going to need to keep your data safe and accessible. Consider a 5 Megapixel image. The table below gives the size of that file in different standard formats. You can see what a difference your format makes to your storage requirements. You should think about which is best for your outputs: For the RDM website, resizing the image saves space and prevents the image becoming distorted by compression by the browser. JPGJPG resized (1024 x 776) PNGBMPTIFFPDF 1.5 MB0.2 MB9.0 MB15.0 MB3.0 MB0.8 MB Getting Started with Research Data Management

Research Data Management Versioning Keep editing under control Whether you’re working on developing software or writing a document, keeping track of changes made by you and your collaborators is a useful tool as you can check that issues have been addressed and mistakes can be undone. Some software will automatically control your versions, while others require you to ‘Save As’ for a new version – every day or every time changes are made. Cloud storage facilities such as LiveDrive and RackSpace as well as the UH Document Management System (DMS) lock documents while they are being edited so you cannot work on the same file as others preventing overwriting. Getting Started with Research Data Management

METADATA Research Data Management

What is metadata? Metadata is additional information that is required to make sense of your files – it’s data about data. This is not a new idea; consider your music or film collection; At least the title, authors, release date, producers, directors, etc. Maybe the artwork, the studio, or the format it was released in such as LP (shown left), tape, CD, MD, Video, super 8, DVD, Blu-ray, 3D, etc. All this information is metadata and allows you to make sense of the data and search the collection for the track that you're looking for. Research Data Management Data metadata Getting Started with Research Data Management

Research Data Management Data metadata How will you capture addition information? Music and Video embed a lot of information; Getting Started with Research Data Management File Info displayed using WinAmp

You need to consider; What contextual details are needed? e.g. a description of the capture methods and data analysis. How will you capture addition information? e.g. in papers, in a database, in a ‘readme’ text file, in file properties/headers. Which standards will you use and why? Data centre recommendations for metadata, controlled vocabularies, and required documentation. Whether there any encoding guidelines you should follow? Research Data Management Data metadata Getting Started with Research Data Management

Research Data Management Data metadata What contextual details are needed? Without additional information we do not know Who is in this picture? When was it taken? Where are they? Who took this photo? How was this picture taken? All this information puts this image in context. Without it, it could be photo taken in the 1800s of Mr and Mrs Straus who died on the Titanic, or a Photoshop adjusted image of a young couple dressing up at Brighton pier in Without additional information we just don’t know. Getting Started with Research Data Management

Research Data Management Data metadata How will you capture addition information? Many of the analysis and develop details will be in your published work – journal papers, conference proceeding, or articles for example – but if your data is separated from this publication, can others make sense of it? If you have a results table or database, you should ensure that metadata is provided for each column and/or row You need to record instructions for use for any software developed Your images need to have the required properties, which can be automatically attached or can you add more information manually Getting Started with Research Data Management

Research Data Management Data metadata Which standards will you use and why? Many data centres recommend particular metadata for the formats that they support. This may be controlled vocabularies or required documentation. Are you require to deposit in a particular data centre? Are there any encoding guidelines you should follow? Across the board, the standard set of metadata for data files is generally of the form: Title, author, file type, size, format, version, date created, date modified, and software. Datasets also have standard metadata that describes the data collection. Getting Started with Research Data Management

SOFTWARE Research Data Management

Research Data Management Software Getting Started with Research Data Management When choosing software; Is it unique to your equipment? Stable or under development? Free to use? Available on multiple operating systems? Is it licensed? Does it produce isolated formats? Is it backwards compatible?

Whether planned or not, obsolescence affects software which will affect the longevity of your data if produced or stored in a format specific to the software. Technical or functional obsolescence If your equipment that has a limited life expectancy, the software may be short lived. store your data in the native format AND in a re-useable, standardised format use stable, open software for your analysis were possible Research Data Management Software Obsolescence Getting Started with Research Data Management

Whether planned or not, obsolescence affects software which will affect the longevity of your data if produced or stored in a format specific to the software. Systematic obsolescence Technology evolves, the demand on software increase, and new editions are release. previous documents may not be compatible with new editions save data in an open format use free, stable software for your analysis Research Data Management Software Obsolescence Getting Started with Research Data Management

Research Data Management Software Getting Started with Research Data Management It may be that your collaborators use different operating systems to you. Just because it works on Windows, doesn’t mean it works on Linux. Check if there are suitable software for your colleagues to access your data. Try and use free, open source options where possible. WindowsLinux Apple Mac.

DOCUMENTATION Research Data Management

Research Data Management Lab Books Why keep a Lab Book? Records are important for development and writing up of your research. You should keep a lab book of your research. a complete reconstruction of the experiment or measurement can be redone later the work can be repeated for re-evaluation of the reported results steps that led to the success or failure of a large project can be extracted patent lawyers need properly documented evidence of inventions Getting Started with Research Data Management

Paper lab books are at risk of loss or damage, and cannot be easily searched. An electronic lab notebook (ELN) is a computer program designed to replace paper lab books; easier to search upon, simplify data copying and backups, and support collaboration Research Data Management Lab Books Getting Started with Research Data Management

Research Data Management Lab Books A good log should include: Steps and procedures and precautions which are not obvious References to other people's work, ideas, hints, and inputs Parameters which might affect the outcome of the experiment Equipment used, type numbers, serial numbers, any calibration steps taken Sketches of experimental layout and traces on recorders, oscilloscopes, etc. The date and time, names of other people observing Rough error analyses taken during the experiment, repeat observations of doubtful readings, calibration errors allowed for Getting Started with Research Data Management

Research Data Management Software Documentation A piece of code without adequate documentation cannot be efficiently or effectively developed, nor can it be understood by users in the future. Documentation comes in many forms: Requirements – statements that identify attributes, capabilities, characteristics, or qualities of a system Architecture – an overview of the software, its purpose and its relations to an environment Technical – the algorithms, interfaces, and APIs End User – manual for end users, system administrators, and support staff Marketing – how to market the product and analysis of the market demand Getting Started with Research Data Management

Research Data Management Software Documentation In a research project lifecycle, these documentation forms are appropriate to different stages from the initial development, using the software for analysis, publishing the development and results of your research, and reuse by others later. Requirements – statements that identify attributes, capabilities, characteristics, or qualities of a system : Using Architecture – an overview of the software, its purpose and its relations to an environment : Using and Writing Up Technical – the algorithms, interfaces, and APIs : Writing Up End User – manual for end users, system administrators, and support staff : Using Marketing – how to market the product and analysis of the market demand : Reuse Getting Started with Research Data Management

CODING Research Data Management

Research Data Management Coding When writing software or analytical code it is important that others and your future self can understand what the code is doing. Wilson et al. (2013) published 10 steps that they regard as the “Best Practices for Scientific Computing” and we agree. “As scientists are never taught how to build software many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists’ productivity and the reliability of their software.” Getting Started with Research Data Management

Research Data Management Best Practice Coding 1. Write programs for people, not computers A program should not require its readers to hold more than a handful of facts in memory at once. Names should be consistent, distinctive, and meaningful Code style and formatting should be consistent All aspects of software development should be broken down into tasks, roughly an hour long ( lines of code) Wilson et al. (2013) Getting Started with Research Data Management

Research Data Management Best Practice Coding 2. Automate repetitive tasks Rely on the computer to repeat tasks Save recent commands in a file for reuse – this could be as simple as using MAKE. Use a build tool to automate your scientific workflows 3. Use the computer to record history Software tools should be used to track computational work automatically It is already possible to record the: Unique identifiers and version numbers for raw data records, programs and libraries Names and version numbers of programs and the values of parameters used to generate any given output Wilson et al. (2013) Getting Started with Research Data Management

Research Data Management Best Practice Coding 4. Make incremental changes Work in small steps with frequent feedback and course correction At each stage of this incomplete code, check that it is working correctly 5. Use version control Keeping alterations in successive versions means that data can be reverted and it can collaboratively developed. Use a standard version control system (VCS) Everything that has been created manually should be put in version control Wilson et al. (2013) Getting Started with Research Data Management

Research Data Management Best Practice Coding Wilson et al. (2013) 6. Don’t repeat yourself (or others) Programmers will use the DRY principal to avoid repeating analysing data, and rewriting code; Every piece of data must have a single authoritative representation in the system At small scales, code should be modularized rather than copied and pasted At large scales, re-use code instead of rewriting it Getting Started with Research Data Management

Research Data Management Best Practice Coding Wilson et al. (2013) 7. Plan for mistakes - they’re inevitable Defensive programming - add assertions to programs to check their operation They ensure that if something goes wrong, the program halts immediately, which aids debugging and they are also executable documentation i.e. the explain the program as well as checking its behaviour Automated Testing - check to make sure that a single unit of code is returning correct results, or that the behaviour of a program hasn’t changed Use an off-the-shelf unit testing library to initialize inputs, run tests, and report their results in a uniform way Getting Started with Research Data Management

Research Data Management Best Practice Coding Wilson et al. (2013) 7. Plan for mistakes (they’re inevitable) Use a variety of oracles - tells a developer how a program should behave or what its output should be In research this includes analytical results, experimental results, and previous results from other tried and tested software. Turn bugs into test cases - write tests that trigger the bug and will prevent that bug from reappearing later Use a symbolic debugger, which allows you to pause a program, inspect the variable values, and move up and down the code to find the problem Getting Started with Research Data Management

Research Data Management Best Practice Coding Wilson et al. (2013) 8. Optimize software only after it works correctly In most cases, the most productive way of optimizing code is to get it working correctly, then identify areas that can be sped up. Use a profiler to identify bottlenecks in your code Write code in the highest-level language possible – you can always shift to a low- level language (like C or Fortran) if the performance boost is needed 9. Document design and purpose, not mechanics refactor code instead of explaining how it works, i.e. rather than write a paragraph to explain a complex piece of code, reorganize it so that its self-explanatory embed the documentation for a piece of software in that software Getting Started with Research Data Management

Research Data Management Best Practice Coding Wilson et al. (2013) 10. Collaborate code reviews are the most cost-effective way of finding bugs in code use pair programming when bringing someone new up to speed and when tackling particularly tricky problems – one developer writes the code which the other provides real-time feedback In larger teams of developers, use an issue tracking toll to maintain a list of tasks to be performed and bugs to be fixed Getting Started with Research Data Management