Download presentation
Presentation is loading. Please wait.
Published byAleesha Douglas Modified over 9 years ago
1
CODING Research Data Management
2
Research Data Management Coding When writing software or analytical code it is important that others and your future self can understand what the code is doing. Wilson et al. (2013) published 10 steps that they regard as the “Best Practices for Scientific Computing” and we agree. “As scientists are never taught how to build software many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists’ productivity and the reliability of their software.” http://arxiv.org/pdf/1210.0530v3.pdf Getting Started with Research Data Management
3
Research Data Management Best Practice Coding 1. Write programs for people, not computers A program should not require its readers to hold more than a handful of facts in memory at once. Names should be consistent, distinctive, and meaningful Code style and formatting should be consistent All aspects of software development should be broken down into tasks, roughly an hour long (50-200 lines of code) Wilson et al. (2013) Getting Started with Research Data Management
4
Research Data Management Best Practice Coding 2. Automate repetitive tasks Rely on the computer to repeat tasks Save recent commands in a file for reuse – this could be as simple as using MAKE. Use a build tool to automate your scientific workflows 3. Use the computer to record history Software tools should be used to track computational work automatically It is already possible to record the: Unique identifiers and version numbers for raw data records, programs and libraries Names and version numbers of programs and the values of parameters used to generate any given output Wilson et al. (2013) Getting Started with Research Data Management
5
Research Data Management Best Practice Coding 4. Make incremental changes Work in small steps with frequent feedback and course correction At each stage of this incomplete code, check that it is working correctly 5. Use version control Keeping alterations in successive versions means that data can be reverted and it can collaboratively developed. Use a standard version control system (VCS) Everything that has been created manually should be put in version control Wilson et al. (2013) Getting Started with Research Data Management
6
Research Data Management Best Practice Coding Wilson et al. (2013) 6. Don’t repeat yourself (or others) Programmers will use the DRY principal to avoid repeating analysing data, and rewriting code; Every piece of data must have a single authoritative representation in the system At small scales, code should be modularized rather than copied and pasted At large scales, re-use code instead of rewriting it Getting Started with Research Data Management
7
Research Data Management Best Practice Coding Wilson et al. (2013) 7. Plan for mistakes - they’re inevitable Defensive programming - add assertions to programs to check their operation They ensure that if something goes wrong, the program halts immediately, which aids debugging and they are also executable documentation i.e. the explain the program as well as checking its behaviour Automated Testing - check to make sure that a single unit of code is returning correct results, or that the behaviour of a program hasn’t changed Use an off-the-shelf unit testing library to initialize inputs, run tests, and report their results in a uniform way Getting Started with Research Data Management
8
Research Data Management Best Practice Coding Wilson et al. (2013) 7. Plan for mistakes (they’re inevitable) Use a variety of oracles - tells a developer how a program should behave or what its output should be In research this includes analytical results, experimental results, and previous results from other tried and tested software. Turn bugs into test cases - write tests that trigger the bug and will prevent that bug from reappearing later Use a symbolic debugger, which allows you to pause a program, inspect the variable values, and move up and down the code to find the problem Getting Started with Research Data Management
9
Research Data Management Best Practice Coding Wilson et al. (2013) 8. Optimize software only after it works correctly In most cases, the most productive way of optimizing code is to get it working correctly, then identify areas that can be sped up. Use a profiler to identify bottlenecks in your code Write code in the highest-level language possible – you can always shift to a low- level language (like C or Fortran) if the performance boost is needed 9. Document design and purpose, not mechanics refactor code instead of explaining how it works, i.e. rather than write a paragraph to explain a complex piece of code, reorganize it so that its self-explanatory embed the documentation for a piece of software in that software Getting Started with Research Data Management
10
Research Data Management Best Practice Coding Wilson et al. (2013) 10. Collaborate code reviews are the most cost-effective way of finding bugs in code use pair programming when bringing someone new up to speed and when tackling particularly tricky problems – one developer writes the code which the other provides real-time feedback In larger teams of developers, use an issue tracking toll to maintain a list of tasks to be performed and bugs to be fixed Getting Started with Research Data Management
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.