Download presentation
Presentation is loading. Please wait.
Published byNathaniel Moody Modified over 8 years ago
1
Politecnico di Torino Andrea Capiluppi andrea.capiluppi@polito.it Characterizing the Open Source Software Process: a Horizontal Study A. Capiluppi, P. Lago, M. Morisio
2
Politecnico di Torino2 Outline Rationale behind the current study Methodology Conclusions Actual and future work
3
Politecnico di Torino3 Rationale Most Open Source analyses focus on a single, flagship project (Linux, Apache, GNOME) Limitation: the conclusions are based on a ‘vertical’ study there is a lack of ‘horizontal’ studies a pool of projects a wider area of interest
4
Politecnico di Torino4 Methodology Choice of projects Attributes definition Coding Analysis
5
Politecnico di Torino5 Choice of projects: repository Selected FreshMeat repository FreshMeat (http://freshmeat.net) is focused on Open Source development since 1996http://freshmeat.net It gathers thousands of projects, either doubled on the pages of SourceForge (http://sourceforge.net), or hosted on FreshMeat only.http://sourceforge.net FreshMeat lists more than 24000 projects (many inactive)
6
Politecnico di Torino6 Choice of projects: sampling I From 24000 to 406 - how? FreshMeat organizes projects by filters and categories Filter = “Topic” Categories = {“Internet”, “Database”, “Multimedia”,…} Other filters: Programming language, Topic (i.e. application domain), Status of Evolution, etc.
7
Politecnico di Torino7 Choice of projects: sampling II We picked randomly a number of projects through the “Status” filter Rationale: limited number of categories associated {“Planning”, “PreAlpha”, “Alpha”, “Beta”, “Stable”, “Mature”} The overall count is 406 projects
8
Politecnico di Torino8 Attribute definition Age Application domain Programming language Size [KB] Number of developers Stable and transient developers Number of users Red: defined by FreshMeat Black: defined by us Modularity level Documentation level Popularity Status Success of project Vitality
9
Politecnico di Torino9 Coding Each attribute was coded twice, to capture evolutive trends First observation: January 2002 Second observation: July 2002
10
Politecnico di Torino10 Analysis Here we discuss: Application domain issues Developers [stable & transient] issues Subscribers (as users) issues Code size issues
11
Politecnico di Torino11 Application domain distribution
12
Politecnico di Torino12 Attributes: project’s developers We evaluate how many people write code for an application External contributions are always credited in special-purpose files, or in the ChangeLog We distinguish between Stable developers Transient developers Core team: more than one stable developer Manual inspections and pattern-recognition scripts
13
Politecnico di Torino13 Developers over projects We observe: 72% of projects have a single stable developer 80% of projects have at most a number of 10 developers
14
Politecnico di Torino14 Developers distribution over projects
15
Politecnico di Torino15 Definition: clusters of developers Cluster 1: 1 to 3 developers (64.5%) Cluster 2: 4 to 10 developers (20%) Cluster 3: 11 to 20 developers (9.5%) “Average” nr. of stable dev: 2 “Average” nr. of transient dev: 3 Cluster 4: more than 20 developers (6%) “Average” nr. of stable dev: 6 “Average” nr. of stable dev: 19
16
Politecnico di Torino16 Productivity vs. ‘global’ developers
17
Politecnico di Torino17 Productivity vs. ‘stable’ developers
18
Politecnico di Torino18 Code variation over clusters
19
Politecnico di Torino19 Attributes: subscribers We use some publicly available data to gather some proxy about users Users ~ Mailing List subscribers (public datum) It’s not a monotonic measure: subscribers can join and leave as well We have a measure of users in two different observations
20
Politecnico di Torino20 Distribution of subscribers over project Around 42% of projects have at most 1 subscriber-user
21
Politecnico di Torino21 Users evolution
22
Politecnico di Torino22 Attributes: project’s size We evaluate the code of each project twice Code evaluated is contained in packages. We exclude from the count: Auxiliary files: documentation, configuration files, GIF files, etc. Legacy code: inherited libraries (e.g. Gnome macros), internationalization code
23
Politecnico di Torino23 Distribution of code size over projects
24
Politecnico di Torino24 Evolutive observations of size changes
25
Politecnico di Torino25 Conclusions I The vast majority of projects are developed by only one developer Adding people to a project has small effect on productivity (i.e. code per developer) Open Source software is made by experts for experts (72% of horizontal projects have more than 10 developers) 58% of projects didn’t change their size 63% of projects had a change within 1%
26
Politecnico di Torino26 Conclusions II Java is relevant for 8% of the projects, C/C++ for 56%, PERL with Python for 20% Observations from flagship projects (Apache, Linux, Gnome) are not confirmed for an average Open Source project Several projects are white noise: to be filtered out Huge amount of data on public repositories: empirical researchers have an invaluable resource of software data
27
Politecnico di Torino27 Current and future work Eliminating white noise: only projects in cluster 3 and 4 have been selected Deeper analysis: the whole story of a project is being studied What can we say with respect of conclusions on bigger OS projects? What can be said about OSS evolution compared with traditional software evolution?
28
Politecnico di Torino28 Attributes: age of a project Age: time interval from a project’s first posting on the FreshMeat pages This measure is a proxy of its actual development time We are ignoring all its past history Most open source projects have a short history behind Most open source projects keep a ChangeLog with first releases date
29
Politecnico di Torino29 Attributes: modularity How we define the modules composing an application? We define three levels for it: Poor modularization (at most 1 folder containing source code) Average modularization (at most 2 folders, typically ‘src’ and ‘lib’ folders) Adequate modularization (more than 3 folders)
30
Politecnico di Torino30 Attributes: popularity and vitality Popularity: composite measure for success in users of an application Vitality: composite measure for success in development of an application Together they give a first definition in the success of a project We are interested in observe how these two attributes evolve together
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.