Politecnico di Torino Andrea Capiluppi Characterizing the Open Source Software Process: a Horizontal Study A. Capiluppi, P.

Slides:



Advertisements
Similar presentations
Software change management
Advertisements

Configuration management
Software Freedom Day th September 2007 Asia Pacific Institute of Information Technology Colombo, Sri Lanka. Nazly Ahmed Scripting The Web.
Software Engineering II - Topic: Software Process Metrics and Project Metrics Instructor: Dr. Jerry Gao San Jose State University
The World Wide Web and the Internet Dr Jim Briggs 1WUCM1.
Supported in part by the National Science Foundation – ISS/Digital Science & Technology Analysis of the Open Source Software development community using.
ANDROID PROGRAMMING MODULE 1 – GETTING STARTED
Kurt Menke, GISP GRASS GIS Geographic Resources Analysis Support System.
Electronic Medical Record OpenEMR. Covered Topics 1 Getting Started 2 Setting up your clinic 3 Adding a new patient 4 Using your calendar.
© 2013 Jones and Bartlett Learning, LLC, an Ascend Learning Company All rights reserved. Security Strategies in Linux Platforms and.
Doxygen: Source Code Documentation Generator John Tully.
This chapter is extracted from Sommerville’s slides. Text book chapter
Mining Large Software Compilations over Time Another Perspective on Software Evolution: Gregorio Robles, Jesus M. Gonzalez-Barahona, Martin Michlmayr,
Analysis and Modeling of the Open Source Software Community Yongqin Gao, Greg Madey Computer Science & Engineering University of Notre Dame Vincent Freeh.
Open Source Workshop1 IBM Software Group Working with Apache Tuscany A Hands-On Workshop Luciano Resende Haleh.
Web Servers Web server software is a product that works with the operating system The server computer can run more than one software product such as .
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
UML - Development Process 1 Software Development Process Using UML (2)
Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 2, 2013 Week Three.
1 CSE 2102 CSE 2102 CSE 2102: Introduction to Software Engineering Ch9: Software Engineering Tools and Environments.
CP476 Internet Computing Lecture 5 : HTTP, WWW and URL 1 Lecture 5. WWW, HTTP and URL Objective: to review the concepts of WWW to understand how HTTP works.
Alexander Serebrenik, Serguei Roubtsov, and Mark van den Brand D n -based Design Quality Comparison of Industrial Java Applications.
Scripting Languages Intro Jan Stelovsky, ICS 215.
University of Maryland Bug Driven Bug Finding Chadd Williams.
Software Engineering CS3003
Software Engineering CS3003 Lecture 3 Software maintenance and evolution.
CMS Security Justin Klein Keane CMS Working Group March 3, 2010.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
Open Source Genealogy Software John Finlay PhpGedView Project Manager.
Service Computation 2010November 21-26, Lisbon.
May 2, 2013 An introduction to DSpace. Module 2 – Help and Support By the end of this module, you will … Understand the help available from the DSpace.
Topology and Evolution of the Open Source Software Community Advisors: Dr. Vincent W. Freeh Dr. Kevin Bowyer Supported in part by the National Science.
Control in ATLAS TDAQ Dietrich Liko on behalf of the ATLAS TDAQ Group.
BioSumm A novel summarizer oriented to biological information Elena Baralis, Alessandro Fiori, Lorenzo Montrucchio Politecnico di Torino Introduction text.
Alexander Serebrenik and Mark van den Brand Theil index for aggregation of software metrics values.
Microsoft Reseach, CambridgeBrendan Murphy. Measuring System Behaviour in the field Brendan Murphy Microsoft Research Cambridge.
User Interfaces 4 BTECH: IT WIKI PAGE:
Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy & Computing Dept. The Open University, UK AICA 2004, Benevento,
DevelopersCommitters Users I’m getting the following exception…. Anybody have any clue why??? +1, I like that idea… Source & Binary Code Repository Bug.
Maureen Doyle, James Walden Northern Kentucky University Students: Grant Welch, Michael Whelan Acknowledgements: Dhanuja Kasturiratna.
14th Oct 2005CERN AB Controls Development Process of Accelerator Controls Software G.Kruk L.Mestre, V.Paris, S.Oglaza, V. Baggiolini, E.Roux and Application.
5-Oct-051 Tango collaboration status ICALEPCS 2005 Geneva (October 2005)
Stanford GSB High Tech Club Tech 101 – Session 1 Introduction to Software, Distributed Architectures, and ASPs Presented by Shawn Carolan Former Manager.
MIS Reports & Analysis MABS Mindanao Supervisors Forum August 25, 2004 Anthony P Petalcorin MABS National MIS Manager.
WERST – Methodology Group
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
White Papers to Fill the Gaps of Standardization of Tables, Figures, and Listings (Creating Standard Targets) Analyses and Displays for Vital Signs, ECG,
Software Engineering Overview DTI International Technology Service-Global Watch Mission “Mission to CERN in Distributed IT Applications” June 2004.
G.Govi CERN/IT-DB 1 September 26, 2003 POOL Integration, Testing and Release Procedure Integration  Packages structure  External dependencies  Configuration.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
Banaras Hindu University. A Course on Software Reuse by Design Patterns and Frameworks.
Hyperion Artifact Life Cycle Management Agenda  Overview  Demo  Tips & Tricks  Takeaways  Queries.
Legacy Systems and Software Reuse CS 560. Economics Software is expensive.  Most software development makes extensive use of existing software.  Developers.
Emilia Mendes1 Professora Visitante CAPES/ Associate Professor Univ. Auckland, NZ. Introdução a Métricas, Qualidade e Medição de Software.
ABCD VS KOHA ; THE ARCHITECTURE AND FUNCTIONALITIES OF SELECTED MODULES. by Joel Nakitare.
L.A.M.P. İlker Korkmaz & Kaya Oğuz CS 350. Why cover a lecture on LAMP? ● Job Opportunities – There are many hosting companies offering LAMP as a web.
CSE 704 Data Center Computing Intro
John D. McGregor Session 9 Testing Vocabulary
The Development Process of Web Applications
Daniel Henry January 30, 2002 CS 4900
THE ARCHITECTURE AND FUNCTIONALITIES OF SELECTED MODULES.
PHP / MySQL Introduction
John D. McGregor Session 9 Testing Vocabulary
John D. McGregor Session 9 Testing Vocabulary
Chapter 5 Designing the Architecture Shari L. Pfleeger Joanne M. Atlee
Practical Software Engineering
Open Source Software Development Processes Version 2.5, 8 June 2002
Software metrics.
Presentation transcript:

Politecnico di Torino Andrea Capiluppi Characterizing the Open Source Software Process: a Horizontal Study A. Capiluppi, P. Lago, M. Morisio

Politecnico di Torino2 Outline Rationale behind the current study Methodology Conclusions Actual and future work

Politecnico di Torino3 Rationale Most Open Source analyses focus on a single, flagship project (Linux, Apache, GNOME) Limitation: the conclusions are based on a ‘vertical’ study there is a lack of ‘horizontal’ studies a pool of projects a wider area of interest

Politecnico di Torino4 Methodology Choice of projects Attributes definition Coding Analysis

Politecnico di Torino5 Choice of projects: repository Selected FreshMeat repository FreshMeat ( is focused on Open Source development since 1996http://freshmeat.net It gathers thousands of projects, either doubled on the pages of SourceForge ( or hosted on FreshMeat only. FreshMeat lists more than projects (many inactive)

Politecnico di Torino6 Choice of projects: sampling I From to how? FreshMeat organizes projects by filters and categories Filter = “Topic” Categories = {“Internet”, “Database”, “Multimedia”,…} Other filters: Programming language, Topic (i.e. application domain), Status of Evolution, etc.

Politecnico di Torino7 Choice of projects: sampling II We picked randomly a number of projects through the “Status” filter Rationale: limited number of categories associated {“Planning”, “PreAlpha”, “Alpha”, “Beta”, “Stable”, “Mature”} The overall count is 406 projects

Politecnico di Torino8 Attribute definition Age Application domain Programming language Size [KB] Number of developers Stable and transient developers Number of users Red: defined by FreshMeat Black: defined by us Modularity level Documentation level Popularity Status Success of project Vitality

Politecnico di Torino9 Coding Each attribute was coded twice, to capture evolutive trends First observation: January 2002 Second observation: July 2002

Politecnico di Torino10 Analysis Here we discuss: Application domain issues Developers [stable & transient] issues Subscribers (as users) issues Code size issues

Politecnico di Torino11 Application domain distribution

Politecnico di Torino12 Attributes: project’s developers We evaluate how many people write code for an application External contributions are always credited in special-purpose files, or in the ChangeLog We distinguish between Stable developers Transient developers Core team: more than one stable developer Manual inspections and pattern-recognition scripts

Politecnico di Torino13 Developers over projects We observe: 72% of projects have a single stable developer 80% of projects have at most a number of 10 developers

Politecnico di Torino14 Developers distribution over projects

Politecnico di Torino15 Definition: clusters of developers Cluster 1: 1 to 3 developers (64.5%) Cluster 2: 4 to 10 developers (20%) Cluster 3: 11 to 20 developers (9.5%) “Average” nr. of stable dev: 2 “Average” nr. of transient dev: 3 Cluster 4: more than 20 developers (6%) “Average” nr. of stable dev: 6 “Average” nr. of stable dev: 19

Politecnico di Torino16 Productivity vs. ‘global’ developers

Politecnico di Torino17 Productivity vs. ‘stable’ developers

Politecnico di Torino18 Code variation over clusters

Politecnico di Torino19 Attributes: subscribers We use some publicly available data to gather some proxy about users Users ~ Mailing List subscribers (public datum) It’s not a monotonic measure: subscribers can join and leave as well We have a measure of users in two different observations

Politecnico di Torino20 Distribution of subscribers over project Around 42% of projects have at most 1 subscriber-user

Politecnico di Torino21 Users evolution

Politecnico di Torino22 Attributes: project’s size We evaluate the code of each project twice Code evaluated is contained in packages. We exclude from the count: Auxiliary files: documentation, configuration files, GIF files, etc. Legacy code: inherited libraries (e.g. Gnome macros), internationalization code

Politecnico di Torino23 Distribution of code size over projects

Politecnico di Torino24 Evolutive observations of size changes

Politecnico di Torino25 Conclusions I The vast majority of projects are developed by only one developer Adding people to a project has small effect on productivity (i.e. code per developer) Open Source software is made by experts for experts (72% of horizontal projects have more than 10 developers) 58% of projects didn’t change their size 63% of projects had a change within 1%

Politecnico di Torino26 Conclusions II Java is relevant for 8% of the projects, C/C++ for 56%, PERL with Python for 20% Observations from flagship projects (Apache, Linux, Gnome) are not confirmed for an average Open Source project Several projects are white noise: to be filtered out Huge amount of data on public repositories: empirical researchers have an invaluable resource of software data

Politecnico di Torino27 Current and future work Eliminating white noise: only projects in cluster 3 and 4 have been selected Deeper analysis: the whole story of a project is being studied What can we say with respect of conclusions on bigger OS projects? What can be said about OSS evolution compared with traditional software evolution?

Politecnico di Torino28 Attributes: age of a project Age: time interval from a project’s first posting on the FreshMeat pages This measure is a proxy of its actual development time We are ignoring all its past history Most open source projects have a short history behind Most open source projects keep a ChangeLog with first releases date

Politecnico di Torino29 Attributes: modularity How we define the modules composing an application? We define three levels for it: Poor modularization (at most 1 folder containing source code) Average modularization (at most 2 folders, typically ‘src’ and ‘lib’ folders) Adequate modularization (more than 3 folders)

Politecnico di Torino30 Attributes: popularity and vitality Popularity: composite measure for success in users of an application Vitality: composite measure for success in development of an application Together they give a first definition in the success of a project We are interested in observe how these two attributes evolve together