Presentation is loading. Please wait.

Presentation is loading. Please wait.

Politecnico di Torino Andrea Capiluppi Characterizing the Open Source Software Process: a Horizontal Study A. Capiluppi, P.

Similar presentations


Presentation on theme: "Politecnico di Torino Andrea Capiluppi Characterizing the Open Source Software Process: a Horizontal Study A. Capiluppi, P."— Presentation transcript:

1 Politecnico di Torino Andrea Capiluppi andrea.capiluppi@polito.it Characterizing the Open Source Software Process: a Horizontal Study A. Capiluppi, P. Lago, M. Morisio

2 Politecnico di Torino2 Outline Rationale behind the current study Methodology Conclusions Actual and future work

3 Politecnico di Torino3 Rationale Most Open Source analyses focus on a single, flagship project (Linux, Apache, GNOME) Limitation: the conclusions are based on a ‘vertical’ study there is a lack of ‘horizontal’ studies a pool of projects a wider area of interest

4 Politecnico di Torino4 Methodology Choice of projects Attributes definition Coding Analysis

5 Politecnico di Torino5 Choice of projects: repository Selected FreshMeat repository FreshMeat (http://freshmeat.net) is focused on Open Source development since 1996http://freshmeat.net It gathers thousands of projects, either doubled on the pages of SourceForge (http://sourceforge.net), or hosted on FreshMeat only.http://sourceforge.net FreshMeat lists more than 24000 projects (many inactive)

6 Politecnico di Torino6 Choice of projects: sampling I From 24000 to 406 - how? FreshMeat organizes projects by filters and categories Filter = “Topic” Categories = {“Internet”, “Database”, “Multimedia”,…} Other filters: Programming language, Topic (i.e. application domain), Status of Evolution, etc.

7 Politecnico di Torino7 Choice of projects: sampling II We picked randomly a number of projects through the “Status” filter Rationale: limited number of categories associated {“Planning”, “PreAlpha”, “Alpha”, “Beta”, “Stable”, “Mature”} The overall count is 406 projects

8 Politecnico di Torino8 Attribute definition Age Application domain Programming language Size [KB] Number of developers Stable and transient developers Number of users Red: defined by FreshMeat Black: defined by us Modularity level Documentation level Popularity Status Success of project Vitality

9 Politecnico di Torino9 Coding Each attribute was coded twice, to capture evolutive trends First observation: January 2002 Second observation: July 2002

10 Politecnico di Torino10 Analysis Here we discuss: Application domain issues Developers [stable & transient] issues Subscribers (as users) issues Code size issues

11 Politecnico di Torino11 Application domain distribution

12 Politecnico di Torino12 Attributes: project’s developers We evaluate how many people write code for an application External contributions are always credited in special-purpose files, or in the ChangeLog We distinguish between Stable developers Transient developers Core team: more than one stable developer Manual inspections and pattern-recognition scripts

13 Politecnico di Torino13 Developers over projects We observe: 72% of projects have a single stable developer 80% of projects have at most a number of 10 developers

14 Politecnico di Torino14 Developers distribution over projects

15 Politecnico di Torino15 Definition: clusters of developers Cluster 1: 1 to 3 developers (64.5%) Cluster 2: 4 to 10 developers (20%) Cluster 3: 11 to 20 developers (9.5%) “Average” nr. of stable dev: 2 “Average” nr. of transient dev: 3 Cluster 4: more than 20 developers (6%) “Average” nr. of stable dev: 6 “Average” nr. of stable dev: 19

16 Politecnico di Torino16 Productivity vs. ‘global’ developers

17 Politecnico di Torino17 Productivity vs. ‘stable’ developers

18 Politecnico di Torino18 Code variation over clusters

19 Politecnico di Torino19 Attributes: subscribers We use some publicly available data to gather some proxy about users Users ~ Mailing List subscribers (public datum) It’s not a monotonic measure: subscribers can join and leave as well We have a measure of users in two different observations

20 Politecnico di Torino20 Distribution of subscribers over project Around 42% of projects have at most 1 subscriber-user

21 Politecnico di Torino21 Users evolution

22 Politecnico di Torino22 Attributes: project’s size We evaluate the code of each project twice Code evaluated is contained in packages. We exclude from the count: Auxiliary files: documentation, configuration files, GIF files, etc. Legacy code: inherited libraries (e.g. Gnome macros), internationalization code

23 Politecnico di Torino23 Distribution of code size over projects

24 Politecnico di Torino24 Evolutive observations of size changes

25 Politecnico di Torino25 Conclusions I The vast majority of projects are developed by only one developer Adding people to a project has small effect on productivity (i.e. code per developer) Open Source software is made by experts for experts (72% of horizontal projects have more than 10 developers) 58% of projects didn’t change their size 63% of projects had a change within 1%

26 Politecnico di Torino26 Conclusions II Java is relevant for 8% of the projects, C/C++ for 56%, PERL with Python for 20% Observations from flagship projects (Apache, Linux, Gnome) are not confirmed for an average Open Source project Several projects are white noise: to be filtered out Huge amount of data on public repositories: empirical researchers have an invaluable resource of software data

27 Politecnico di Torino27 Current and future work Eliminating white noise: only projects in cluster 3 and 4 have been selected Deeper analysis: the whole story of a project is being studied What can we say with respect of conclusions on bigger OS projects? What can be said about OSS evolution compared with traditional software evolution?

28 Politecnico di Torino28 Attributes: age of a project Age: time interval from a project’s first posting on the FreshMeat pages This measure is a proxy of its actual development time We are ignoring all its past history Most open source projects have a short history behind Most open source projects keep a ChangeLog with first releases date

29 Politecnico di Torino29 Attributes: modularity How we define the modules composing an application? We define three levels for it: Poor modularization (at most 1 folder containing source code) Average modularization (at most 2 folders, typically ‘src’ and ‘lib’ folders) Adequate modularization (more than 3 folders)

30 Politecnico di Torino30 Attributes: popularity and vitality Popularity: composite measure for success in users of an application Vitality: composite measure for success in development of an application Together they give a first definition in the success of a project We are interested in observe how these two attributes evolve together


Download ppt "Politecnico di Torino Andrea Capiluppi Characterizing the Open Source Software Process: a Horizontal Study A. Capiluppi, P."

Similar presentations


Ads by Google