Open sharing and maintenance of scientific code Jordan S Read; Luke A Winslow
Background Who I am – USGS-CIDA – 2012 PhD in physical limnology (UW-Madison) – Civil Engineer My experience with code and model development – Lake Analyzer – CLM – rGDP; rGLM – Numerous collaborations
Background My philosophy on science code: “Code created for the pursuit of science questions should be open, accessible, and designed to enable others to build from” Kind of like your scientific publications, right? That means I shouldn’t be able to build my scientific livelihood around a piece of “black-box” code
Background My responsibility as a member of the science community: “Methods used to obtain published results should be clear, transparent and repeatable” My responsibility as a federal employee: “Provide public access to all elements of publicly funded research”
Road map Part I My experiences with science code development Motivation to open up your scientific code Part II Maintaining and modifying code Code collaboration
Lake Analyzer GLEON background – Hanson & Hamilton collaboration and student exchange – Physics & Climate working group Requirements – Easy to use – Provide access to complex physical derivatives – Handle dataset irregularities Errors, gaps, intermittent sampling frequencies, etc. – Rapid processing of large datasets
Lake Analyzer I took on the role of primary coder – Why? GLEON had paid my travel to two meetings…including NZ! I did the work in MATLAB, because that is what I was most familiar with Side project during grad school Built from feedback from GLEON physics & climate group
Lake Analyzer
Repeatable –.lke file ~ metadata Visualizations (plotting options for outputs) Easy to use
Lake Analyzer Software publication
Lake Analyzer Software publication Open codebase
Software publication Open codebase Platform/language independence Lake Analyzer
Software publication Open codebase Platform/language independence Useful and citable 19 citations in ~20 months
Opening up scientific code Publishing your code – Would a simple paper of physical derivations be cited at this rate? – Would a methods paper be as popular if the code wasn’t available/open? – Additional motivation for creation of code Writing open code – More use – Ease of collaboration – Integrity/transparency
Opening up scientific code Reasons many choose not to open code – Too much work – Code is too messy – Potential for criticism – Code as scientific livelihood – Has known errors… – Others?
Opening up scientific code When to put in the effort – Collaborations – When you are doing it “right” – When you will use it in the future – When you are publishing something – When you have to – Others?
Part II: Maintaining code So…the code works, what’s next? How do I take risks with code? – i.e., changing the way a function works – What if I make a mistake? (undo+undo+undo…?) How do multiple people collaborate on a single set of scripts? – In serial? – Google docs vs word for writing a paper
Maintaining code Risky modifications – Metabolism_modelv28.R? – Metabolism_model_NEW.R? – Metabolism_model_NEWsecondTRY.R? – Metabolism_model_NEWEST.R?
Maintaining code When we publish, we use track changes – Can we do the same for code? Version management – AKA: version control, revision control, source control – How it works – Why you should know what it means – Benefits to using version management Historical record of code evolution Easy to “roll back” to previous working version The code has only one home
Maintaining code How it works – Creates a “life history of code”
Hey, nice sweater Thanks. I travel a lot. Want to start a project? Sure! I have some modeling code So do I! Let’s combine our efforts Maintaining code How it works – Creates a “life history of code”
Maintaining code Here is a new set of methods
Maintaining code I made some improvements
Maintaining code Whoops! Fixed a bug
Conclusions Code as if it will be seen and used by others – You may be that “other” in 3 years Decide if creating publicly usable code makes sense for your research Make your code accessible to collaborators Consider the concepts imbedded in version management
Jordan S Read USGS Center for Integrated Data Analytics | Jordan S Read USGS Center for Integrated Data Analytics | Questions? Thanks GLEON FP & TLS!