Department of Information Business Discussion of a Large-Scale Open Source Data Collection Methodology Michael Hahsler and Stefan Koch Department of Information.

Slides:



Advertisements
Similar presentations
SOFTWARE MAINTENANCE 24 March 2013 William W. McMillan.
Advertisements

Metrics for Process and Projects
1 These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 5/e and are provided with permission by.
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
University of Leeds Department of Chemistry The New MCM Website Stephen Pascoe, Louise Whitehouse and Andrew Rickard.
An Open Source Community Christina K Pikas LBSC708P November 10, 2005.
Software Engineering II - Topic: Software Process Metrics and Project Metrics Instructor: Dr. Jerry Gao San Jose State University
© , Michael Aivazis DANSE Software Issues Michael Aivazis California Institute of Technology DANSE Software Workshop September 3-8, 2003.
APPLICATION DEVELOPMENT BY SYED ADNAN ALI.
Supported in part by the National Science Foundation – ISS/Digital Science & Technology Analysis of the Open Source Software development community using.
Chapter 9: Moving to Design
Software Process and Product Metrics
The Premier Software Usage Analysis and Reporting Toolset Maximizing Value for Software Users.
Software Configuration Management CSC-532 Chandra Shekar Kandi Chandra Shekar Kandi.
Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.
How Microsoft Builds Software* Presented by: Ron Norman Society for Software Quality June 23, 1998 Michael A. Cusumano Professor of Strategy & Technology.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
1 Software Maintenance and Evolution CSSE 575: Session 8, Part 2 Analyzing Software Repositories Steve Chenoweth Office Phone: (812) Cell: (937)
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Analysis and Modeling of the Open Source Software Community Yongqin Gao, Greg Madey Computer Science & Engineering University of Notre Dame Vincent Freeh.
Software Engineering Modern Approaches
Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,
Martin Kokonya ENERGY PROJECTS IDENTIICATION AND ANALYSIS TOOLS ADB FINESSE Training Course on Renewable Energy and Energy Efficiency for Poverty Reduction.
Revealing Semantic Quality Characteristics of Free and Open Source Software Stergios Draganidis Kerstin V. Siakas
Chapter 6 : Software Metrics
T. E. Potok - University of Tennessee Software Engineering Dr. Thomas E. Potok Adjunct Professor UT Research Staff Member ORNL.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Workshop on Computer-Supported Knowledge Collaboration, Shanghai, July 7, Current Status of Software Industry in Japan.
Software Engineering - Spring 2003 (C) Vasudeva Varma, IIITHClass of 39 CS3600: Software Engineering: Standards in Process Modeling CMM and PSP.
Master Thesis Defense Jan Fiedler 04/17/98
1 Introduction to Software Configuration Management CprE 556 Electrical and Computer Engineering Department Iowa State University.
A Framework for creating hybrid-open source software communities Srinarayan Sharma et. al. Info Systems (2002), 12.
VCU Information Systems Institute Advanced Delivery Methodology Courtesy of Data Management That Works.
The LCG SPI project in LCG Phase II CHEP’06, Mumbai, India Feb. 14, 2006 Andreas Pfeiffer -- for the SPI team
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
© 1998 Carnegie Mellon UniversityTutorial The Personal Software Process (PSP) The overview of the PSP that follows has been built from material made.
Assessing the Frequency of Empirical Evaluation in Software Modeling Research Workshop on Experiences and Empirical Studies in Software Modelling (EESSMod)
Software Project Management Lecture # 3. Outline Chapter 22- “Metrics for Process & Projects”  Measurement  Measures  Metrics  Software Metrics Process.
Lecture 4 Software Metrics
Georgia Institute of Technology CS 4320 Fall 2003.
Topology and Evolution of the Open Source Software Community Advisors: Dr. Vincent W. Freeh Dr. Kevin Bowyer Supported in part by the National Science.
Lucian Voinea Visualizing the Evolution of Code The Visual Code Navigator (VCN) Nunspeet,
REAL TIME GPS TRACKING SYSTEM MSE PROJECT PHASE I PRESENTATION Bakor Kamal CIS 895.
University of Waterloo How does your software grow? Evolution and architectural change in open source software Michael Godfrey Software Architecture Group.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Alexander Serebrenik and Mark van den Brand Theil index for aggregation of software metrics values.
CERN – European Organization for Nuclear Research Administrative Support - Internet Development Services CET and the quest for optimal implementation and.
Research Quality Framework Presentation to APSR - ARROW - Repository Market Day 4 May 2007 Sandra Fox Department of Education Science and Training.
Predrag Buncic (CERN/PH-SFT) WP9 - Workshop Summary
Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy & Computing Dept. The Open University, UK AICA 2004, Benevento,
© 2007 SRI International CPATH Principal Investigators Meeting: Program Evaluation Update March 26, 2010 By the Center for Education Policy Dr. Raymond.
GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.
1 COMPUTER SCIENCE DEPARTMENT COLORADO STATE UNIVERSITY 1/9/2008 SAXS Software.
Module 4: Systems Development Chapter 13: Investigation and Analysis.
Advanced Software Engineering Lecture 4: Process & Project Metrics.
1 Object-Oriented Analysis and Design with the Unified Process Figure 13-1 Implementation discipline activities.
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen Budapest
Advances In Software Inspection
These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 6/e and are provided with permission by.
Collaborative Development Services Learning From the Open Source Agile Development Process Richard Kilmer, InfoEther LLC.
1 The FreeBSD Project: a Replication Case Study of Open Source Development.
Objective ICT : Internet of Services, Software & Virtualisation FLOSSEvo some preliminary ideas.
MANAGEMENT INFORMATION SYSTEM
Software Engineering (CSI 321)
Function Point Analysis
Empirical Project Monitor and Results from 100 OSS Development Projects Masao Ohira Empirical Software Engineering Research Laboratory, Nara Institute.
Chapter 3 What Economics Do.
Software metrics.
Metrics for process and Projects
Mark Quirk Head of Technology Developer & Platform Group
Presentation transcript:

Department of Information Business Discussion of a Large-Scale Open Source Data Collection Methodology Michael Hahsler and Stefan Koch Department of Information Business, Vienna University of Economics and BA Presented at the HICSS-38, January 3-6, 2005 Hilton Waikoloa Village, Big Island, Hawaii

Department of Information Business 2 Michael Hahsler Vienna University of Economics and BA Motivation Bazaar as a new development paradigm – Large number of developers – Rigorous peer review / parallel debugging – Evolving software products / frequent releases Existing Studies – Ideological debates – In-depth analyses of single projects Successful project hosting sites – Control and manage OSS development – Virtual communities → Methodology for large scale quantitative investigations to analyze the new development paradigm

Department of Information Business 3 Michael Hahsler Vienna University of Economics and BA Outline Methodology – Data retrieval – Metrics Possible analyzes – Community/Project/Participant level – Effort and Productivity Discussion of advantages and disadvantages

Department of Information Business 4 Michael Hahsler Vienna University of Economics and BA Data Retrieval Data sources – Version control system (e.g., CVS or SVN) – Additional information (e.g., from project web pages, bug tracking systems) Collected data – Consistently aggregated data – Easy to access for analyzes (e.g., in a relational database) Example: Sourceforge.net – 20,000+ projects (in 2002) – Extract information from summary page of each project – 8,791 projects using CVS actively – Download of 33 GB of version control information

Department of Information Business 5 Michael Hahsler Vienna University of Economics and BA Data retrieval process for Sourceforge.net Data sources Collected data

Department of Information Business 6 Michael Hahsler Vienna University of Economics and BA Metrics Many metrics are possible. Some commonly used metrics are: Lines-of-code (LOC, NCSS) Commits (associated with change requests) Participating programmers Active time spend on the project Development indicator (planning, alpha, stable,…)

Department of Information Business 7 Michael Hahsler Vienna University of Economics and BA Possible Analyzes Single Participant  LOC, Commits  Activity Patterns  Programming Style Participant Level Project Level Community Level  SW Evolution  Coordination  Productivity  Effort Estimation  Distribution of Inputs / Outputs  Relationship Inputs / Outputs  Co-Participation in Projects Team  Distribution of Effort (Inequality)  Cooperation on Files

Department of Information Business 8 Michael Hahsler Vienna University of Economics and BA Examples: Community Level Available assets – Developers Activity – Active time spent on projects – Number of commits – Collaboration Outcome – Size (e.g., LOC) – Value (e.g., number of downloads of a software product) – Community (human capital) Histogram of project size in Sourceforge.net: Power-law distributions are common as a result of positive feedback loops

Department of Information Business 9 Michael Hahsler Vienna University of Economics and BA Examples: Project Level Productivity: Does activity of developers depends on project status, teamwork or well-known „core“ developer? Programming practices: e.g., software patterns, frameworks Software Evolution Law of SW evolution (for commercial development): Growth rate decreases over time caused by growing complexity – Also valid for OSS? Or do OSS development practices enable super-linear growth? – 39% of the Sourceforge.net projects exhibit super-linear growth. Why?

Department of Information Business 10 Michael Hahsler Vienna University of Economics and BA Cumulated Programmers Examples: Participant Level Distribution of effort within a project team – Lorenz curves – Gini coefficient 20% of the developers do typically 80% of the coding Lorenz curves for distribution of commits within two projects from Sourceforge.net Cumulated Programmers Cumulated commits

Department of Information Business 11 Michael Hahsler Vienna University of Economics and BA Effort Estimation Effort is needed for comparison of OSS to traditional development For OSS effort is typically unknown (no time sheets) Effort estimation for projects on Sourceforge.net Problems – Models are calibrated for commercial software development – Often has restrictions which make it incompatible with OSS ModelMedian effort per projectTotal effort COCOMO2.02 PY160,020 PY Rayleigh-Norden0.69 PY5,965 PY PY … person-years

Department of Information Business 12 Michael Hahsler Vienna University of Economics and BA Discussion Collect data automatically for a large number of projects developed by a community Advantages: – Starting point for effort estimation – Allows statistical tests for differences between projects – Analyzing community aspects Drawbacks: – Uncertainty about data quality – Need to adapt retrieval component for each hosting site/service