Software Sustainability Institute Software Attribution can we improve the reusability and sustainability of scientific software? NSF SI2 PIs Meeting, February 2014 Neil Chue Hong Software Sustainability Institute ORCID: | Unless otherwise indicated slides licensed under Supported by Project funding from
Software Sustainability Institute The Research Cycle Create Test Interpret Publish Revise Paper Data Software Research Outputs Research is a continuous cycle. When we publish we are contributing to the body of knowledge.
Software Sustainability Institute Research/Reuse/Reward Cycle Index Identify Cite Reward Create Test Interpret Publish Revise Research Reuse Reuse is also a cycle. We build our research on the work of others. Reward mechanisms should encourage reuse.
Software Sustainability Institute The current process Start research Write software Use software Produce results Produce results Publish research paper Release data Release data Release software Release software Which mentions software and data This process is simple but does not reward production or reuse of good software and data. It also has a long contribution cycle.
Software Sustainability Institute Write software A better process? Start research Identify existing software Identify existing software Use software Produce results Produce results Publish research paper Adapt/ extend software Release data Release data Release software Release software Publish software paper Publish software paper Publish data paper Publish data paper Which references software and data papers Software and data papers are needed as proxies for rewarding reuse. But it enables a shorter contribution cycle for data and software.
Software Sustainability Institute Authorship Lifecycle Identif y Cite Reuse Research Index Papers, data, software all research outputs of a continuous cycle. With software, technology makes it easier to track, but not reward. We cannot separate papers, data and software when we release research.
Software Sustainability Institute What do we choose to identify: - Workflow? - Software that runs workflow? - Software referenced by workflow? - Software dependencies? What’s the minimum citable part? Boundary
Software Sustainability Institute Algorithm Function Program Library / Suite / Package … Granularity
Software Sustainability Institute Versioning Personal v1 Personal v1 Personal v2 Personal v2 Personal v3 Personal v3 Personal v2a Personal v2a Public v1 Public v1 Personal v3a Personal v3a Personal v2a Personal v2a Public v2 Public v2 Public v3 Public v3 Why do we version? - To indicate a change - To allow sharing - To confer special status
Software Sustainability Institute Authorship Authorship Which authors have had what impact on each version of the software? Which authors have had what impact on each version of the software? Who had the largest contribution to the scientific results in a paper? Who had the largest contribution to the scientific results in a paper? OGSA-DAI projects statistics from Ohloh
Software Sustainability Institute Software Journals
Software Sustainability Institute Peer review of software? Can the aspects of peer review be decoupled? Novelty and acceptability Validity and quality Accurate metadata helps sustainability But excessive metadata requirements are a barrier Essentially, for reuse and sustainability Where is it? Who wrote it? How do I run it? How do I find out more? Software Papers: Improving the reusability and sustainability of scientific software
Software Sustainability Institute Types of Metadata Name Provenance and Ownership Functionality and Constraints Content and Composition Environment and Dependencies Location See also: Significant Properties of Software (Matthews et al) Software Ontology (Malone et al)
Software Sustainability Institute JORS software metapaper
Software Sustainability Institute References Reuse Screenshots Introduction Implementation + Usage Anatomy of a software meta- paper Metadata Quality Control
Software Sustainability Institute F1000Research Web Tool Other journals you can publish software in:
Software Sustainability Institute Code as a Research Object What if you could assign DOIs to code easily? Could we make software more reusable?
Software Sustainability Institute Alternative Metrics
Software Sustainability Institute I can get credit for everything Automatically generated from GitHub Repository Starring as a means of recommendation Forking analogous to citing for software … but not necessarily reward
Software Sustainability Institute Career Paths in UK Careers outside academic sector Non-university Research (industry, government etc.) ProfessorPermanent Research Staff Early Career Research PhD students Source: The Scientific Century, Royal Society, 2010 (revised to reflect first stage clarification from “What Do PhD’s Do?” study) UK STEM graduate career paths
Software Sustainability Institute Where we are now We must describe and cite software otherwise we cannot benefit from and reward reuse and refinement Software papers are a citation mechanism that works with existing infrastructure and norms Direct citation of code + metadata might be better But we still need to fix the reward mechanism for non-traditional research outputs And this is entirely in our hands as scientists
Software Sustainability Institute Further Information Software Papers: Improving the reusability and sustainability of scientific software Journals in which you can publish software: Journal of Open Research Software Discussion: what is the minimum metadata required to describe a code object for scientific reuse? Contribute: Code as a research object: The DOI for this presentation: /m9.figshare The Software Sustainabilty Institute is a collaboration between universities of Edinburgh, Manchester, Oxford and Southampton. Supported by EPSRC Grant EP/H043160/1.