Where might software fit with CoreTrustSeal Where might software fit with CoreTrustSeal? Additional material on SSoA contributed by Patrick Aerts, NLeSC 21st September 2017, RDA10 Plenary, Montreal Neil Chue Hong (@npch), Software Sustainability Institute ORCID: 0000-0002-8876-7606 | N.ChueHong@software.ac.uk Supported by Project funding from Slides licensed under CC-BY where indicated:
“Trustworthiness” of software isn’t new From software engineering ISO/IEC 9126 and 25000 Software Product Quality Software Engineering Body of Knowledge From research software NASA Reuse Readiness Levels Ontosoft Portal – describing geosciences software ESIP – research software guidelines for earth sciences CLARIAH – software quality guidelines DLR – software maturity levels for aerospace codes Depsy and Libraries.io – dependencies and impact ELIXIR – 10 Metrics for life sciences software good practices
Why do we care? Researchers Software Developers Make it easier to choose between software Incentivise production of good software Independent assessment Software Developers Increase recognition of good practice Facilitate discovery and reuse Infrastructure Providers Drive traffic to projects / repository Industry Improve reuse and commercialisation Encourage use by small and medium-sized enterprises (SMEs) Funders Enable more efficient investment Creating a basis for a marketplace for research software
Software sustainability Treat Software Sustainability and Data Stewardship on equal footing At least policy wise Consider Software (and data) as value objects Then it starts making sense to spend some to keep the value or increase it Make the stakeholder positions explicit, define their role and involve all Funders, scientists, executive organisations Patrick Aerts, NLeSC/DANS
Approving Data: DSoA / ODI
Repos vs Producers vs Product For data, we have repositories and datasets Data Seal of Approval applied to repository ODI certificates applied to dataset For software, we can choose to assess the software producer, software project or the software product Software Seal of Approval for producer? Software Assessment Certificate for product? Also see CHAOSS project for community health
Comparisons Software Producer Assess quality of software development process and practice Indicator of track record, can be used to estimate likely quality of software produced Apply to individual, team, group, or organisation? Software Product Assess quality of a piece of software Measures are often subjective and contrasting (e.g. interface stability vs functionality updating) Apply to particular version, updated regularly?
Purpose of a Software Seal of Approval Establishing a Software Seal of Approval will serve three purposes: To get scientific software developed at a state-of-the-art-level of “quality” To get grip on the cost of future software maintenance after a project is finished To provide an additional criterion for research councils/funders to credit researchers
Developing the SSoA The SSoA: Is under development internationally Workshops setting specs and defining levels of Seals Mostly concentrating on quality of the development process Version Control, Documentation, maintenance, modularity, etc… In close co-operation with DANS/SSI and (expected) the KE group
SSoA for producers – rewarding process and practice 1 (No badge / Red) 2 (Bronze / Amber) 3 (Gold / Green) Description/ website Doesn’t explain s/w Basic description Clear description Source code Not available Uses code repository Uses CI Licensing Unclear Clearly stated Enables reuse Support Issues tracker / Mail Support mechanism Communications None Email address Mail list / forum etc Testing No tests or data Test data available Automated testing User documentation Basic README Clear examples Dev documentation Basic API-level docs Release cycle Unclear / no recent Regular releases Release roadmap Contributor Process Clear process Examples of use No examples of use in research One or two examples Several cases studies / examples Related pubs / data / citations Some Many Red / Amber / Green, Bronze / Silver / Gold Should there be a split at the higher level? Use Good Practices guidelines
Establishing a SSoA Understand how SSoA measures line up with FAIR Build on existing guidelines and get consensus Understand how we encourage trust and usage of a “Software Seal of Approval” Finalize discussions 1Q2018 Form “board” from a coalition of the willing 2Q2018 Be in business end 1H2018
CHAOSS metrics
Repositories What about software repositories? “Most software is not published through a repository” “Most software in a repository is in GitHub” “I deposit my software in FigShare / Zenodo” But if it is in a repository… … does it matter what kind of repository? Institutional vs Figshare / Zenodo Digital repository versus code repository
Research Software Workflow develop share preserve Developed and versioned using code repository Published via code repository or website Deposited in digital repository with paper / for preservation
Figshare example Source code from GitHub can be deposited in FigShare Figshare “mothership” or institutional FigShare Also collection managers Who are we “trusting”? For institutional Figshare, the local repository managers? For Figshare mothership, is it Figshare or the person depositing the code?
Institutional Repositories Increasingly, institutional repositories are being used for code An issue has been support for licenses / metadata on deposit If the repository has gone through the CoreTrustSeal process, can it “upgrade” for software?
Things which are similar R0: Repository Type R3 Continuity of access R4 Confidentiality / Ethics R5 Organizational infrastructure R7 Data integrity R9 Documented storage procedures R10 Preservation plan R13 Data discovery and identification R15 Technical Infrastructure R16 Security
Things which are slightly different R0: Level of curation – what does enhanced mean in the case of software? R1 Mission / scope – more “non-research specific” repositories used for software R2 Licenses – what does it mean to maintain software licenses? R6 Expert guidance – what about non-research specific repositories? R8 Appraisal – do you need different/additional policies e.g. for understanding execution R12 Workflows – specific to software deposit, ”timing” may be different
Things which require discussion R11 Data quality What should the assessment of “software quality” comprise? Documentation - build on work from software papers? Automated assessment – of metadata only (licenses, citations) or more? R14 Data reuse More than metadata required for software reuse What does format change mean for software? Portability across platforms / OS versions?
What we’d need to do Provide guidance for interpretation of CoreTrustSeal for software deposit Clarify and metadata required for software preservation Understand where code repositories and large third-party repositories fit Provides the last level of certification with: Software Seal of Approval for certifying producers CHAOSS for measuring projects / products