Open sharing and maintenance of scientific code Jordan S Read; Luke A Winslow 2013-08-20.

Slides:



Advertisements
Similar presentations
Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD.
Advertisements

SENIOR SEMINARS Specifics & Example Performances CEPR Center for Educational Policy Research.
Strategies for solving scientific problems using computers.
SEP1 - 1 Introduction to Software Engineering Processes SWENET SEP1 Module Developed with support from the National Science Foundation.
The Writing Process.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
© , Michael Aivazis DANSE Software Issues Michael Aivazis California Institute of Technology DANSE Software Workshop September 3-8, 2003.
July 29, 2007Community Modeling - Shine Useful Community Modeling Capabilities – One Perspective J. Todd Hoeksema Shine 2007.
Software Process and Product Metrics
CODING Research Data Management. Research Data Management Coding When writing software or analytical code it is important that others and your future.
INTRODUCTION TO RESEARCH DATA MANAGEMENT Robin Desmeules Janice Kung J W Scott Health Sciences Library University of Alberta Libraries.
Problems with reuse – Increased maintenance costs; lack of tool support; not-invented- here syndrome; creating, maintaining, and using a component library.
Authors Toni Pippola, Tampere University of Applied Sciences Timo Poranen, University of Tampere Matti Vuori, Tampere University.
1 Presenters: Lucretia Parham Janice Zeigler Armstrong Atlantic State University May 14 10:15 a.m. - 11:15 a.m. Presenters: Lucretia Parham Janice Zeigler.
Michael Byrne Geographic Information Officer National Broadband Map Update.
Choose between Access and Excel Right questions, right program If you’re having trouble choosing between Access and Excel, take a moment to answer an important.
Why you should be using Version Control. Matt Krass Electrical/Software Engineer November 22, 2014.
SOFTWARE ENGINEERING1 Introduction. Software Software (IEEE): collection of programs, procedures, rules, and associated documentation and data SOFTWARE.
CS 360 Lecture 3.  The software process is a structured set of activities required to develop a software system.  Fundamental Assumption:  Good software.
Visual Linker Final presentation.
Error reports as a source for SPI Tor Stålhane Jingyue Li, Jan M.N. Kristiansen IDI / NTNU.
Software Estimation and Function Point Analysis Presented by Craig Myers MBA 731 November 12, 2007.
Chapter 6 : Software Metrics
This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under.
Presented by Abirami Poonkundran.  Introduction  Current Work  Current Tools  Solution  Tesseract  Tesseract Usage Scenarios  Information Flow.
Web Advisory Group (WAG) Implementation Plan ITC 10/19/04 Markus Stobbs.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Project 1 Rubric What are the expectations for your report?
Version Control.
Sherry Lake Candidate for Metadata Specialist for User Projects.
Creating documentation and metadata: Recording provenance and context Jeff Arnfield National Climatic Data Center Version a1.0 Review Date.
Slide 12.1 Chapter 12 Implementation. Slide 12.2 Learning outcomes Produce a plan to minimize the risks involved with the launch phase of an e-business.
SOFTWARE ENGINEERING1 Introduction. SOFTWARE ENGINEERING2 Software Q : If you have to write a 10,000 line program in C to solve a problem, how long will.
Connecting with Computer Science2 Objectives Learn how software engineering is used to create applications Learn some of the different software engineering.
Getting Ready for STEVE Mapping Tools for STEVE William R. Bolton, Jr. State Registrar and Director Division of Vital Records Administration New Hampshire.
©2001 Southern Illinois University, Edwardsville All rights reserved. Today Wednesday Running A Paper Prototyping Session Paper Prototyping Video: Paper.
CS223: Software Engineering Lecture 4: Software Development Models.
(1) Introduction to Continuous Integration Philip Johnson Collaborative Software Development Laboratory Information and Computer Sciences University of.
Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
Copyright © , Dennis J. Frailey, All Rights Reserved Day 2, Part 1, Page 1 1/11/2004 Day 2, Part 1 Estimating Software Size Section 2 Calculating.
Metrics "A science is as mature as its measurement tools."
Software Development Process CS 360 Lecture 3. Software Process The software process is a structured set of activities required to develop a software.
Working with your archive organization: Broadening your user community Robert R. Downs, PhD Socioeconomic Data and Applications Center (SEDAC) Center for.
INFO 636 Software Engineering Process I Prof. Glenn Booker Week 10 – Process Definition 1INFO636 Week 10.
U.S. Department of the Interior U.S. Geological Survey Manage and Provide Information: Examples from fish health, contaminants, and water quality data.
Google maps engine and language presentation Ibrahim Motala.
Working with Your Archive : Broadening Your User Community Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
ENG 113: INTRODUCTION TO COMPOSITION THE ART OF COMPOSITION.
JORDANHILL SCHOOL WEBSITE AND PODCAST TRAINING SESSION How to use ICT to enhance teaching and learning.
JORDANHILL SCHOOL BLOG AND PODCAST TRAINING SESSION How to use ICT to enhance teaching and learning.
By: Jamie Morgan  A wiki is a web page or collection of web pages which you and your students can access to contribute or modify content without having.
Communication Arts The Writing Process. Communication Arts GUIDING CONCEPT As writers, we understand and demonstrate the ability and flexibility to use.
Building Comfort With MATLAB
The Scientific Method.
Visual Studio Database Tools (aka SQL Server Data Tools)
Managing the Project Lifecycle
Sample Wiki Comments?.
Introduction SOFTWARE ENGINEERING.
Software Documentation
Maintaining software solutions
Working with your archive organization Broadening your user community
Introduction CSE 1310 – Introduction to Computers and Programming
Web Advisory Group (WAG) Implementation Plan
Software Project Planning &
Your Facility Your Information
How to Use the Story Board Layout
Putting together your History Day Project
How to Use the Story Board Layout
Bringing more value out of automation testing
Presentation transcript:

Open sharing and maintenance of scientific code Jordan S Read; Luke A Winslow

Background Who I am – USGS-CIDA – 2012 PhD in physical limnology (UW-Madison) – Civil Engineer My experience with code and model development – Lake Analyzer – CLM – rGDP; rGLM – Numerous collaborations

Background My philosophy on science code: “Code created for the pursuit of science questions should be open, accessible, and designed to enable others to build from” Kind of like your scientific publications, right? That means I shouldn’t be able to build my scientific livelihood around a piece of “black-box” code

Background My responsibility as a member of the science community: “Methods used to obtain published results should be clear, transparent and repeatable” My responsibility as a federal employee: “Provide public access to all elements of publicly funded research”

Road map Part I My experiences with science code development Motivation to open up your scientific code Part II Maintaining and modifying code Code collaboration

Lake Analyzer GLEON background – Hanson & Hamilton collaboration and student exchange – Physics & Climate working group Requirements – Easy to use – Provide access to complex physical derivatives – Handle dataset irregularities Errors, gaps, intermittent sampling frequencies, etc. – Rapid processing of large datasets

Lake Analyzer I took on the role of primary coder – Why? GLEON had paid my travel to two meetings…including NZ! I did the work in MATLAB, because that is what I was most familiar with Side project during grad school Built from feedback from GLEON physics & climate group

Lake Analyzer

Repeatable –.lke file ~ metadata Visualizations (plotting options for outputs) Easy to use

Lake Analyzer Software publication

Lake Analyzer Software publication Open codebase

Software publication Open codebase Platform/language independence Lake Analyzer

Software publication Open codebase Platform/language independence Useful and citable 19 citations in ~20 months

Opening up scientific code Publishing your code – Would a simple paper of physical derivations be cited at this rate? – Would a methods paper be as popular if the code wasn’t available/open? – Additional motivation for creation of code Writing open code – More use – Ease of collaboration – Integrity/transparency

Opening up scientific code Reasons many choose not to open code – Too much work – Code is too messy – Potential for criticism – Code as scientific livelihood – Has known errors… – Others?

Opening up scientific code When to put in the effort – Collaborations – When you are doing it “right” – When you will use it in the future – When you are publishing something – When you have to – Others?

Part II: Maintaining code So…the code works, what’s next? How do I take risks with code? – i.e., changing the way a function works – What if I make a mistake? (undo+undo+undo…?) How do multiple people collaborate on a single set of scripts? – In serial? – Google docs vs word for writing a paper

Maintaining code Risky modifications – Metabolism_modelv28.R? – Metabolism_model_NEW.R? – Metabolism_model_NEWsecondTRY.R? – Metabolism_model_NEWEST.R?

Maintaining code When we publish, we use track changes – Can we do the same for code? Version management – AKA: version control, revision control, source control – How it works – Why you should know what it means – Benefits to using version management Historical record of code evolution Easy to “roll back” to previous working version The code has only one home

Maintaining code How it works – Creates a “life history of code”

Hey, nice sweater Thanks. I travel a lot. Want to start a project? Sure! I have some modeling code So do I! Let’s combine our efforts Maintaining code How it works – Creates a “life history of code”

Maintaining code Here is a new set of methods

Maintaining code I made some improvements

Maintaining code Whoops! Fixed a bug

Conclusions Code as if it will be seen and used by others – You may be that “other” in 3 years Decide if creating publicly usable code makes sense for your research Make your code accessible to collaborators Consider the concepts imbedded in version management

Jordan S Read USGS Center for Integrated Data Analytics | Jordan S Read USGS Center for Integrated Data Analytics | Questions? Thanks GLEON FP & TLS!