Sociotechnical production systems for software in science James Howison and Jim Herbsleb Institute for Software Research School of Computer Science Carnegie.

Slides:



Advertisements
Similar presentations
Software Sustainability Institute “Doing Science Properly in the Digital Age” UK e-Infrastructure Academic User Community Forum 12 September.
Advertisements

© 2006 MVTec Software GmbH Press Colloquium Part II Building Technology for the Customer’s Advantage.
SEP1 - 1 Introduction to Software Engineering Processes SWENET SEP1 Module Developed with support from the National Science Foundation.
SECOND MIDTERM REVIEW CS 580 Human Computer Interaction.
Role of RAS in the Agricultural Innovation System Rasheed Sulaiman V
Careers in CS & Engineering. CS & Engineering careers are not all this….
Meta-Design, Participative Software Systems, and Web Contribution - Daniela Fogli Dipartimento di Elettronica per l’Automazione Università degli.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
NSF and Environmental Cyberinfrastructure Margaret Leinen Environmental Cyberinfrastructure Workshop, NCAR 2002.
Is 'Designing' Cyberinfrastructure - or, Even, Defining It - Possible? Peter A. Freeman National Science Foundation January 29, 2007 The views expressed.
Innovation and IS Kieran Mathieson. What is Innovation?  Long definition Successful innovation is the creation and implementation of new processes, products,
1 The Red Team Gwen Jacobs Ed Lazowska. 2 What biologists want … z Can I evaluate an experimental design? z Can I store the results? z Can I visualize.
SCHOOL OF INFORMATION UNIVERSITY OF MICHIGAN Success Factors for Collaboratories Gary M. Olson Collaboratory for Research on Electronic Work School of.
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
Finding the Right LINCS Beth Fredrick, Center for Literacy Studies
Klassificering af Inf. Systemer Baseret på: Luis M. Camarinha-Matos & Hamideh Afsarmanesh: Collaborative networks: a new scientific discipline.
Computational Thinking Related Efforts. CS Principles – Big Ideas  Computing is a creative human activity that engenders innovation and promotes exploration.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
EFFECTIVITY BATUHAN FAHRAN Dokuz Eylul University Industrial Engineering Department.
Information Technology
Margaret J. Cox King’s College London
Computers Are Your Future Tenth Edition Chapter 12: Databases & Information Systems Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
University of Palestine software engineering department Testing of Software Systems Fundamentals of testing instructor: Tasneem Darwish.
Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009.
1 European policies for e- Infrastructures Belarus-Poland NREN cross-border link inauguration event Minsk, 9 November 2010 Jean-Luc Dorel European Commission.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Science of Security Experimentation John McHugh, Dalhousie University Jennifer Bayuk, Jennifer L Bayuk LLC Minaxi Gupta, Indiana University Roy Maxion,
Mapping the Impact Pathways: How Management Research Can Be Relevant? Hazhir Rahmandad Post-doctoral Associate MIT.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
Software Engineering Saeed Akhtar The University of Lahore Lecture 8 Originally shared for: mashhoood.webs.com.
GIS and Community Health. Some critiques of GIS emphasize the potentially harmful social consequences of the diffusion of GIS technology, including reinforcing.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
Can architecture descriptions help prospective users to visualise the solution in terms of meeting its requirements? Peter Henderson Open Middleware Infrastructure.
Shruthi(s) II M.Sc(CS) msccomputerscience.com. Introduction Digital Libraries have become the source of information sharing across the globe for education,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
AFRD modeling and simulation meeting – 09/09/2013 Introduction - J.-L. Vay Snowmass CSS 2013 – Computing Frontier: accelerator science.
Obstacles and opportunities with using visual and domain-specific languages in scientific programming Michael Jones, Christopher Scaffidi School of Electrical.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Standard 1- Leadership & Vision Sara Saffell Amy Blackwell Marilyn McDonald 1. Leadership and Vision-Educational leaders inspire a shared vision for comprehensive.
Three Critical Matters in Big Data Projects for e- Science Kerk F. Kee, Ph.D. Assistant Professor, Chapman University Orange, California
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Queensland University of Technology CRICOS No J HOW RESEARCHERS FIND INFORMATION IN THE NEW DIGITAL AGE Gaynor Austen Director, Library Services.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Yazd University, Electrical and Computer Engineering Department Course Title: Advanced Software Engineering By: Mohammad Ali Zare Chahooki The Project.
© Copyright AARNet Pty Ltd PRAGMA Update & some personal observations James Sankar Network Engineer - Middleware.
Summary of HEP SW workshop Ian Bird MB 15 th April 2014.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
T EST T OOLS U NIT VI This unit contains the overview of the test tools. Also prerequisites for applying these tools, tools selection and implementation.
Communities and Portals Lan Zhang School of Information University of Texas at Austin.
1 Kostas Glinos European Commission - DG INFSO Head of Unit, Géant and e-Infrastructures "The views expressed in this presentation are those of the author.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Scientific Software Ecosystems James Howison and Jim Herbsleb Carnegie Mellon University School of Computer Science Research supported by the NSF Office.
Why KM is Important KM enhances mission command, facilitates the exchange of knowledge, supports doctrine development, fosters leaders’ development, supports.
Scholarly Workflow: Federal Prototype and Preprints
Tools and Services Workshop
EOSC MODEL Pasquale Pagano CNR - ISTI
Joslynn Lee – Data Science Educator
National e-Infrastructure Vision
Impact Panel SI^2 PIs Meeting.
ESciDoc Introduction M. Dreyer.
Digital Science Center
Introduction to SOA Part II: SOA in the enterprise
From Use Cases to Implementation
Presentation transcript:

Sociotechnical production systems for software in science James Howison and Jim Herbsleb Institute for Software Research School of Computer Science Carnegie Mellon University School of Information University of Texas at Austin

How does a a cubic km of ice become a scientific paper?

First find some ice Image Credit: NASA

Build a big drill Image Credit: IceCube

and some Digital Optical Modules Image Credit: IceCube

Combine Image Credit: IceCube

Collect and filter data Image Credit: IceCube

Store and analyze it Image Credit:

Simulate light in ice Photo credit:

Simulate Atmosphere Image Credit: NASA

Model

Analyze

Plots

Publish

Software is everywhere

Enhancing reproducibility and correctness Saving money Driving innovation Coalescing into widely used software platforms All linked to software as information artifact: Re-playable Re-useable Extendable A appealing vision of software …

Yet software also has constraints Maintenance (avoiding “bit rot”) – Software must be maintained (“synchronization work” – Kept in sync with complements and dependencies Coordinated – Rapid development and changes can lead to breakdown Path dependencies – Easy to start, hard to architect for widespread use

How to achieve the Software Vision? Better technologies? Better engineering methods? Leadership/Norms/Ethics? Policy? Rewards?

A sociotechnical understanding Understand software work in existing institutions of science Specific Research Questions: – What software is used? – How created and maintains it? – What incentives drive its creation? – Why is it trusted?

Method: Data – Route into complex practice Chose paper as unit of analysis: “Focal Paper” Trace back from paper to work that produced it – Semi-structured interviews Supported by artifacts (e.g., paper/methods and materials) Elicit workflow, focus on software work Identify software authors/sources, and seek introductions – Qualitative analysis – Phenomenological exhaustion

Case 1: STAR Image Credit: RHIC

Our focal paper

Workflow

Software Production 1.Employed Core Software development – Professional software developers – ROOT4STAR framework 2.Core simulation code – Scientists undertaking “service work” 3.Analysis code – “to get the plots” – Locally written, frozen at publication

Case 3: Bioinformatic microbiology Image Credit:

Studying the nitrogen cycle Image Credit: Focal Paper

A field revolutionized by software

Personal software infrastructure “Power user scripts” Personal competitive advantage “that is something that most biologists can’t do. period.” Share methods but not personal infrastructure code or actively support others – Methods and materials section should provide enough information, if not he’ll fix it. – But not going “to do their homework for them”

“Publishing on” software Tools potentially useful to others described in separate publications, “Software pubs” Ambivalence: – Can you make a career out of this? “Definitely” – But: “he’s known for his software rather than his science … he’s known for facilitating science rather than … and some people have that reputation” – Advise a student to do this? – “Yes, but … if you happen to get a publication out of it and it becomes a tool that’s widely used, then great, that’s fantastic, better props for you … but there’s a danger … Tool developers are greatly under-appreciated”

Algorithm people Self-described member of the “algorithm people” as distinguished from biologists Muscle: “biology == strcmp()” Builds from scratch (“avoid tricky dependencies”) “Obvious” that they don’t collaborate – Credit accrues to the “original publications” – Little credit in perceived incremental improvements – Politics of improvement acceptance “at the mercy of” – Competition is appropriate and productive

Software Production systems Practice that is similar on four aspects: 1.Incentives for the work 2.The type of artifacts produced 3.The way it is organized 4.The logic of correctness

Context: Academic reputation system

Software as support

Collaboration service-work

Academic credit: Incidental software

Academic credit: Parallel software practice

Systemic threats to software vision The type of software work needed to realize the cyberinfrastructure vision is poorly motivated – “Invisible work” (Star and Ruhlender) Especially, little incentive to collaborate – Project “owned” by initial creators – Initial publications receive citations – Extension dominated by fork-and-rename

Academic reputation and integration James Howison and Jim Herbsleb (2013) Sharing the spoils: incentives and integration in scientific software production. ACM CSCW

Where to for science policy? Exhortations? Training? Forcing “open source” through funding lever? – Risk of substituting logics of correctness “Kleenex” code as open source? – Risk of undermining appropriate competition – Turn scientists into open source community managers? When there is little reward for this work?

Scientific Software Network Map But, you know, imagine it as a live, dynamic data set!

Techniques for measuring use Software that reports its own use – Instrumentation Analysis of traces in papers – Mentions, citations – Characteristic artifacts Analysis of collections of software – On supercomputing resources (TACC, NICS) – Through workflow systems (Galaxy, Pegasus, Taverna)

Contact James Howison This material is based upon work supported by the US National Science Foundation under Grant No. #