Impact Panel SI^2 PIs Meeting
Software Cycle How does the use of your software generate impact? How does your impact generate resources, for sustainability?
What impacts are SI2 projects aiming for? How ought those impacts be measured? How ought those impacts be communicated to others? Successful infrastructure might be invisible, but it didn’t become infrastructure by being invisible.
Survey summary (metrics) Level 0 Haven’t thought much about it yet Level 1 (our activity) Our publications Speed of development activity (bugs tracked/fixed) Level 2 (others’ activity) Count ‘em (Downloads, citations) Direct and Indirect impact (software and papers) Level 4? (activity in context) Finding tipping points (e.g. activity of others exceeds ours) Describing novel science that is facilitated Passive vs Active collection.
Panel members Michael McLennan, HubZero Jason Priem, ImpactStory.org Doug Thain, Cooperative Computing Tools (SI^2) Jim Jagielski, Apache Foundation/Incubator Travel delay Robert van de Geijn, BLIS (SI^2)
HUBzero: Measuring the Impact of Scientific Works Michael McLennan Director, HUBzero® Platform for Scientific Collaboration Purdue University NSF SI2 PI Meeting, January 17-18, 2013
Lesser General Public License, LGPL-3.0 What is HUBzero? Lesser General Public License, LGPL-3.0 Linux/Apache/MySQL/PHP Download: http://hubzero.org/download 40+ sites worldwide NSF SI2 PI Meeting, January 17-18, 2013
Metrics Reported for Resources NSF SI2 PI Meeting, January 17-18, 2013
Metrics Reported for Contributors NSF SI2 PI Meeting, January 17-18, 2013
Metrics Reported about Community NSF SI2 PI Meeting, January 17-18, 2013
NSF SI2 PI Meeting, January 17-18, 2013 Tracking Citations NSF SI2 PI Meeting, January 17-18, 2013
Jason Priem
Doug Thain
Douglas Thain, Todd Tannenbaum, and Miron Livny, How to Measure a Large Open Source Distributed System, Concurrency and Computation: Practice and Experience, 18(15) 2006.
SI2-SSE: Connecting Cyberinfrastructure with the Cooperative Computing Tools Douglas Thain, University of Notre Dame http://www.nd.edu/~ccl Public Global Namespace http://chirp.cse.nd.edu Strategy: Create optional online interactions that have a direct benefit to the end user and to the service provider alike. Offline Data Analysis Service http://condorlog.cse.nd.edu Real Time Service Monitoring http://www.nd.edu/~ccl/viz
Apache Incubation and Impact Apache Incubation Process Real focus on external contributors Because Contribution indicates Impact Money/Time where the mouth is.
Techniques for measuring impact Traces on your website/infrastructure: Registrations, Downloads, Clickstreams Tracing support interactions (discussions) Traces in publications Citations (and other mentions) Characteristic artifacts (esp. figures) Traces in execution Software that reports its own usage Studies of use (e.g., productivity) Evidence in workflows
Downloads vs Installed Base Wiggins, A., Howison, J., & Crowston, K. (2009). Heartbeat: Measuring Active User Base and Potential User Interest in FLOSS Projects. In IFIP OSS Conference
Downloads vs Installed Base Wiggins, A., Howison, J., & Crowston, K. (2009). Heartbeat: Measuring Active User Base and Potential User Interest in FLOSS Projects. In IFIP OSS Conference
Robert van de Geijn
Measure 1 (Paolo Bientinesi) “The impact of software S for application C should be the ratio between the time it would take to complete the application without the use of software S and the time it would take to complete it with the software.” (paraphrased) In other words, it is the productivity multiplication factor. (I am told) Economists call this the "Marginal Utility Price Ratio" -- the ratio of the marginal utility of the software, to its cost. By this measure, the value of the GotoBLAS would be about 1.0/0.9 = 1.1. John Stanton, a computational chemist, immediately applauded this measure. So, to users of software this seems to make sense.
Measure 2 (Chris Bischof) “The number of refereed publications in good journals or conferences that credit the use of this resource, and the transitive impact of those publications.” Google Scholar: 500 citations to papers by Goto in the last 5 years. ACM DL: ~70 citations in high quality papers in the last 5 years. I was not able to determine how many papers mention the GotoBLAS without citing the papers, which is really what Chris meant, I think. Hard to measure the transitive effect…
Resources Apache Foundation Incubation Guidelines incubator.apache.org/guides/ “Buzzing Communities” (Richard Milligan) Feverbee.com Producing Open Source Software (Karl Fogel) producingoss.com Impact Story Impact-Story.org
Counting citations How are you going to count / list publications? One, clear, citation that is recognizable to engines (Google Scholar, Citeseer, Web of Science) Vs Adding authors to reward contributions, new publications to describe innovations When are you going to ask? At registration, at upgrade, at execution? What are the standards in your field? Perhaps 10% of users cite correctly